Qwen3 Quantized Models β Lexicons Edition
This repository provides quantized versions of the Qwen3 language models, optimized for efficient deployment on edge devices and low-resource environments. The following models have been added to our Lexicons Model Zoo:
Qwen_Qwen3-0.6B-Q4_K_M
Qwen_Qwen3-1.7B-Q4_K_M
Qwen_Qwen3-4B-Q4_K_M
Qwen3-8B-Q4_K_M
Model Overview
Qwen3 is the latest open-source LLM series developed by Alibaba Group. Released on April 28, 2025, the models were trained on 36 trillion tokens across 119 languages and dialects. Qwen3 models are instruction-tuned and support long context windows and multilingual capabilities. This model is described in An Empirical Study of Qwen3 Quantization.
The quantized versions provided here use 4-bit Q4_K_M precision ensuring high performance at a fraction of the memory and compute cost. These models are ideal for real-time inference, chatbots, and on-device applications.
Key Features
- Efficient Quantization: 4-bit quantized models (Q4_K_M) for faster inference and lower memory usage.
- Multilingual Mastery: Trained on a massive, diverse corpus covering 119+ languages.
- Instruction-Tuned: Fine-tuned to follow user instructions effectively.
- Scalable Sizes: Choose from 0.6B to 8B parameter models based on your use case.
Available Quantized Versions
Model Name | Parameters | Quantization | Context Length | Recommended Use |
---|---|---|---|---|
Qwen_Qwen3-0.6B-Q4_K_M | 0.6B | Q4_K_M | 4K tokens | Lightweight devices, microservices |
Qwen_Qwen3-1.7B-Q4_K_M | 1.7B | Q4_K_M | 4K tokens | Fast inference, chatbots |
Qwen_Qwen3-4B-Q4_K_M | 4B | Q4_K_M | 4K tokens | Balanced performance & efficiency |
Qwen3-8B-Q4_K_M | 8B | Q4_K_M | 128K tokens | Complex reasoning, long documents |
Performance Insights
Quantized Qwen3 models at Q4_K_M retain impressive reasoning and comprehension capabilities while cutting down the memory and compute needs. Based on the latest findings (arXiv:2505.02214), Qwen3 models are robust even under lower bit quantization when used appropriately.
Code
The project is released on Github and Hugging Face.
- Downloads last month
- 114
4-bit