Qwen3 Embedding&Reranker GPTQ
Collection
6 items
•
Updated
GPTQ Quantized https://huggingface.co/Qwen/Qwen3-Embedding-0.6B with THUIR/T2Ranking and m-a-p/COIG-CQIA for calibration set.
VRAM Usage: 3228M
-> 2124M
~1.69% lost in C-MTEB.
C-MTEB | Param. | Mean(Task) | Mean(Type) | Class. | Clust. | Pair Class. | Rerank. | Retr. | STS |
---|---|---|---|---|---|---|---|---|---|
multilingual-e5-large-instruct | 0.6B | 58.08 | 58.24 | 69.80 | 48.23 | 64.52 | 57.45 | 63.65 | 45.81 |
bge-multilingual-gemma2 | 9B | 67.64 | 75.31 | 59.30 | 86.67 | 68.28 | 73.73 | 55.19 | - |
gte-Qwen2-1.5B-instruct | 1.5B | 67.12 | 67.79 | 72.53 | 54.61 | 79.5 | 68.21 | 71.86 | 60.05 |
gte-Qwen2-7B-instruct | 7.6B | 71.62 | 72.19 | 75.77 | 66.06 | 81.16 | 69.24 | 75.70 | 65.20 |
ritrieve_zh_v1 | 0.3B | 72.71 | 73.85 | 76.88 | 66.5 | 85.98 | 72.86 | 76.97 | 63.92 |
Qwen3-Embedding-0.6B | 0.6B | 66.33 | 67.45 | 71.40 | 68.74 | 76.42 | 62.58 | 71.03 | 54.52 |
This Model | 0.6B-W4A16 | 65.21 | 66.30 | 71.36 | 66.12 | 74.96 | 62.63 | 69.10 | 53.65 |
pip install compressed-tensors optimum
and auto-gptq
/ gptqmodel
, then goto the official usage guide.