Qwen3-Embedding-0.6B-W4A16-G128

GPTQ Quantized https://huggingface.co/Qwen/Qwen3-Embedding-0.6B with THUIR/T2Ranking and m-a-p/COIG-CQIA for calibration set.

What's the benefit?

VRAM Usage: 3228M -> 2124M

What's the cost?

~1.69% lost in C-MTEB.

C-MTEB Param. Mean(Task) Mean(Type) Class. Clust. Pair Class. Rerank. Retr. STS
multilingual-e5-large-instruct 0.6B 58.08 58.24 69.80 48.23 64.52 57.45 63.65 45.81
bge-multilingual-gemma2 9B 67.64 75.31 59.30 86.67 68.28 73.73 55.19 -
gte-Qwen2-1.5B-instruct 1.5B 67.12 67.79 72.53 54.61 79.5 68.21 71.86 60.05
gte-Qwen2-7B-instruct 7.6B 71.62 72.19 75.77 66.06 81.16 69.24 75.70 65.20
ritrieve_zh_v1 0.3B 72.71 73.85 76.88 66.5 85.98 72.86 76.97 63.92
Qwen3-Embedding-0.6B 0.6B 66.33 67.45 71.40 68.74 76.42 62.58 71.03 54.52
This Model 0.6B-W4A16 65.21 66.30 71.36 66.12 74.96 62.63 69.10 53.65

How to use it?

pip install compressed-tensors optimum and auto-gptq / gptqmodel, then goto the official usage guide.

Downloads last month
19
Safetensors
Model size
214M params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for boboliu/Qwen3-Embedding-0.6B-W4A16-G128

Quantized
(4)
this model

Collection including boboliu/Qwen3-Embedding-0.6B-W4A16-G128