Qwen3 Embedding&Reranker GPTQ
Collection
6 items
•
Updated
GPTQ Quantized Qwen/Qwen3-Embedding-4B with THUIR/T2Ranking and m-a-p/COIG-CQIA for calibration set.
VRAM Usage: 17430M
-> 11000M
(w/o FA2).
~0.72%
lost in C-MTEB.
Evaluation performed with official code.
C-MTEB | Param. | Mean(Task) | Mean(Type) | Class. | Clust. | Pair Class. | Rerank. | Retr. | STS |
---|---|---|---|---|---|---|---|---|---|
multilingual-e5-large-instruct | 0.6B | 58.08 | 58.24 | 69.80 | 48.23 | 64.52 | 57.45 | 63.65 | 45.81 |
bge-multilingual-gemma2 | 9B | 67.64 | 68.52 | 75.31 | 59.30 | 86.67 | 68.28 | 73.73 | 55.19 |
gte-Qwen2-1.5B-instruct | 1.5B | 67.12 | 67.79 | 72.53 | 54.61 | 79.5 | 68.21 | 71.86 | 60.05 |
gte-Qwen2-7B-instruct | 7.6B | 71.62 | 72.19 | 75.77 | 66.06 | 81.16 | 69.24 | 75.70 | 65.20 |
ritrieve_zh_v1 | 0.3B | 72.71 | 73.85 | 76.88 | 66.5 | 85.98 | 72.86 | 76.97 | 63.92 |
Qwen3-Embedding-4B | 4B | 72.27 | 73.51 | 75.46 | 77.89 | 83.34 | 66.05 | 77.03 | 61.26 |
This Model | 4B-W4A16 | 71.75 | 73.05 | 75.43 | 77.51 | 83.04 | 65.73 | 76.15 | 60.47 |
pip install compressed-tensors optimum
and auto-gptq
/ gptqmodel
, then goto the official usage guide.