Qwen3-14B-3bit / README.md
KCh3dRi4n's picture
Upload folder using huggingface_hub
005507d verified
|
raw
history blame
1.15 kB
---
license: apache-2.0
base_model: Qwen/Qwen3-14B
tags:
- mlx
- 3bit
- quantized
---
# Qwen3-14B 3bit MLX
This model is a 3-bit quantized version of [Qwen/Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B) using MLX.
## Model Details
- **Quantization**: 3-bit
- **Framework**: MLX
- **Base Model**: Qwen/Qwen3-14B
- **Model Size**: ~5.25GB (3-bit quantized)
- **Original Size**: 14B parameters
## Usage
```python
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Qwen3-14B-3bit")
prompt = "Hello, how are you?"
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=formatted_prompt, max_tokens=100)
print(response)
```
## Performance on M1/M2/M3
- **Memory Usage**: ~7-8GB
- **Inference Speed**: ~15-25 tokens/sec (M1 Max)
- **First Token Latency**: ~1-3 seconds
## Requirements
- Apple Silicon Mac (M1/M2/M3)
- macOS 13.0+
- Python 3.8+
- MLX and mlx-lm packages
## Installation
```bash
pip install mlx mlx-lm
```
## Chat Mode
```bash
mlx_lm.chat --model mlx-community/Qwen3-14B-3bit
```