Qwen3-14B-3bit / README.md
KCh3dRi4n's picture
Upload folder using huggingface_hub
005507d verified
|
raw
history blame
1.15 kB
metadata
license: apache-2.0
base_model: Qwen/Qwen3-14B
tags:
  - mlx
  - 3bit
  - quantized

Qwen3-14B 3bit MLX

This model is a 3-bit quantized version of Qwen/Qwen3-14B using MLX.

Model Details

  • Quantization: 3-bit
  • Framework: MLX
  • Base Model: Qwen/Qwen3-14B
  • Model Size: ~5.25GB (3-bit quantized)
  • Original Size: 14B parameters

Usage

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Qwen3-14B-3bit")

prompt = "Hello, how are you?"
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

response = generate(model, tokenizer, prompt=formatted_prompt, max_tokens=100)
print(response)

Performance on M1/M2/M3

  • Memory Usage: ~7-8GB
  • Inference Speed: ~15-25 tokens/sec (M1 Max)
  • First Token Latency: ~1-3 seconds

Requirements

  • Apple Silicon Mac (M1/M2/M3)
  • macOS 13.0+
  • Python 3.8+
  • MLX and mlx-lm packages

Installation

pip install mlx mlx-lm

Chat Mode

mlx_lm.chat --model mlx-community/Qwen3-14B-3bit