Qwen3 Quantized Models – Lexicons Edition

This repository provides quantized versions of the Qwen3 language models, optimized for efficient deployment on edge devices and low-resource environments. The following models have been added to our Lexicons Model Zoo:

  • Qwen_Qwen3-0.6B-Q4_K_M
  • Qwen_Qwen3-1.7B-Q4_K_M
  • Qwen_Qwen3-4B-Q4_K_M
  • Qwen3-8B-Q4_K_M

Model Overview

Qwen3 is the latest open-source LLM series developed by Alibaba Group. Released on April 28, 2025, the models were trained on 36 trillion tokens across 119 languages and dialects. Qwen3 models are instruction-tuned and support long context windows and multilingual capabilities. This model is described in An Empirical Study of Qwen3 Quantization.

The quantized versions provided here use 4-bit Q4_K_M precision ensuring high performance at a fraction of the memory and compute cost. These models are ideal for real-time inference, chatbots, and on-device applications.


Key Features

  • Efficient Quantization: 4-bit quantized models (Q4_K_M) for faster inference and lower memory usage.
  • Multilingual Mastery: Trained on a massive, diverse corpus covering 119+ languages.
  • Instruction-Tuned: Fine-tuned to follow user instructions effectively.
  • Scalable Sizes: Choose from 0.6B to 8B parameter models based on your use case.

Available Quantized Versions

Model Name Parameters Quantization Context Length Recommended Use
Qwen_Qwen3-0.6B-Q4_K_M 0.6B Q4_K_M 4K tokens Lightweight devices, microservices
Qwen_Qwen3-1.7B-Q4_K_M 1.7B Q4_K_M 4K tokens Fast inference, chatbots
Qwen_Qwen3-4B-Q4_K_M 4B Q4_K_M 4K tokens Balanced performance & efficiency
Qwen3-8B-Q4_K_M 8B Q4_K_M 128K tokens Complex reasoning, long documents

Performance Insights

Quantized Qwen3 models at Q4_K_M retain impressive reasoning and comprehension capabilities while cutting down the memory and compute needs. Based on the latest findings (arXiv:2505.02214), Qwen3 models are robust even under lower bit quantization when used appropriately.

Code

The project is released on Github and Hugging Face.

Downloads last month
114
GGUF
Model size
752M params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SandLogicTechnologies/Qwen3-GGUF

Finetuned
Qwen/Qwen3-0.6B
Quantized
(65)
this model