Delete .ipynb_checkpoints
Browse files
.ipynb_checkpoints/README-checkpoint.md
DELETED
@@ -1,25 +0,0 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
base_model:
|
4 |
-
- Qwen/Qwen3-Reranker-0.6B
|
5 |
-
pipeline_tag: text-classification
|
6 |
-
tags:
|
7 |
-
- transformers
|
8 |
-
---
|
9 |
-
# Qwen3-Reranker-0.6B-W4A16-G128
|
10 |
-
|
11 |
-
GPTQ Quantized [Qwen/Qwen3-Reranker-0.6B](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B) with Ultrachat, [THUIR/T2Ranking](https://huggingface.co/datasets/THUIR/T2Ranking) and [m-a-p/COIG-CQIA](huggingface.co/datasets/m-a-p/COIG-CQIA) for calibration set.
|
12 |
-
|
13 |
-
## What's the benefit?
|
14 |
-
|
15 |
-
VRAM Usage: `3228M` -> `2124M` (w/o FA2, according to Embedding model's result).
|
16 |
-
|
17 |
-
## What's the cost?
|
18 |
-
|
19 |
-
I think `<5%` accuracy, further evaluation on the way...
|
20 |
-
|
21 |
-
[The Embedding one](https://huggingface.co/boboliu/Qwen3-Embedding-4B-W4A16-G128#whats-the-cost) shows `~0.7%`.
|
22 |
-
|
23 |
-
## How to use it?
|
24 |
-
|
25 |
-
`pip install compressed-tensors optimum` and `auto-gptq` / `gptqmodel`, then goto [the official usage guide](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B#transformers-usage).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|