Qwen3 Embedding&Reranker GPTQ
Collection
6 items
•
Updated
GPTQ Quantized Qwen/Qwen3-Reranker-8B with Ultrachat, THUIR/T2Ranking and m-a-p/COIG-CQIA for calibration set.
VRAM Usage: more than 24G -> 19624M
, make it available on 3090/4090. (w/o FA2, according to Embedding model's result).
I think <5%
accuracy, further evaluation on the way...
The Embedding one shows ~0.7%
.
pip install compressed-tensors optimum
and auto-gptq
/ gptqmodel
, then goto the official usage guide.