boboliu
/

Qwen3-Embedding-0.6B-W4A16-G128

Feature Extraction

sentence-transformers

text-generation

sentence-similarity

text-generation-inference

compressed-tensors

Model card Files Files and versions Community

Qwen3-Embedding-0.6B-W4A16-G128

GPTQ Quantized https://huggingface.co/Qwen/Qwen3-Embedding-0.6B with THUIR/T2Ranking and m-a-p/COIG-CQIA for calibration set.

What's the benefit?

VRAM Usage: 3228M -> 2124M

What's the cost?

~1.69% lost in C-MTEB.

C-MTEB	Param.	Mean(Task)	Mean(Type)	Class.	Clust.	Pair Class.	Rerank.	Retr.	STS
multilingual-e5-large-instruct	0.6B	58.08	58.24	69.80	48.23	64.52	57.45	63.65	45.81
bge-multilingual-gemma2	9B	67.64	75.31	59.30	86.67	68.28	73.73	55.19	-
gte-Qwen2-1.5B-instruct	1.5B	67.12	67.79	72.53	54.61	79.5	68.21	71.86	60.05
gte-Qwen2-7B-instruct	7.6B	71.62	72.19	75.77	66.06	81.16	69.24	75.70	65.20
ritrieve_zh_v1	0.3B	72.71	73.85	76.88	66.5	85.98	72.86	76.97	63.92
Qwen3-Embedding-0.6B	0.6B	66.33	67.45	71.40	68.74	76.42	62.58	71.03	54.52
This Model	0.6B-W4A16	65.21	66.30	71.36	66.12	74.96	62.63	69.10	53.65

How to use it?

pip install compressed-tensors optimum and auto-gptq / gptqmodel, then goto the official usage guide.

Downloads last month: 189

Safetensors

Model size

214M params

Tensor type

I64

·

I32

·

BF16

·

Inference Providers NEW

Feature Extraction

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for boboliu/Qwen3-Embedding-0.6B-W4A16-G128

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Embedding-0.6B

Quantized

(8)

this model

Collection including boboliu/Qwen3-Embedding-0.6B-W4A16-G128

Qwen3 Embedding&Reranker GPTQ

6 items • Updated Jun 7