Qwen3-Embedding-4B-W4A16-G128

GPTQ Quantized Qwen/Qwen3-Embedding-4B with THUIR/T2Ranking and m-a-p/COIG-CQIA for calibration set.

What's the benefit?

VRAM Usage: 17430M -> 11000M (w/o FA2).

What's the cost?

~0.72% lost in C-MTEB.

Evaluation performed with official code.

C-MTEB Param. Mean(Task) Mean(Type) Class. Clust. Pair Class. Rerank. Retr. STS
multilingual-e5-large-instruct 0.6B 58.08 58.24 69.80 48.23 64.52 57.45 63.65 45.81
bge-multilingual-gemma2 9B 67.64 68.52 75.31 59.30 86.67 68.28 73.73 55.19
gte-Qwen2-1.5B-instruct 1.5B 67.12 67.79 72.53 54.61 79.5 68.21 71.86 60.05
gte-Qwen2-7B-instruct 7.6B 71.62 72.19 75.77 66.06 81.16 69.24 75.70 65.20
ritrieve_zh_v1 0.3B 72.71 73.85 76.88 66.5 85.98 72.86 76.97 63.92
Qwen3-Embedding-4B 4B 72.27 73.51 75.46 77.89 83.34 66.05 77.03 61.26
This Model 4B-W4A16 71.75 73.05 75.43 77.51 83.04 65.73 76.15 60.47

How to use it?

pip install compressed-tensors optimum and auto-gptq / gptqmodel, then goto the official usage guide.

Downloads last month
141
Safetensors
Model size
871M params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for boboliu/Qwen3-Embedding-4B-W4A16-G128

Base model

Qwen/Qwen3-4B-Base
Quantized
(4)
this model

Collection including boboliu/Qwen3-Embedding-4B-W4A16-G128