HaoranChu commited on
Commit
ea7061c
·
verified ·
1 Parent(s): ddf2ebf

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Guidelines for Loading Qwen3 (GPTQ) Quantized Models
2
+
3
+ ## Installation Setup
4
+
5
+ Download the `GPTQ-for-Qwen_hf` folder.
6
+
7
+ ## File Replacement
8
+
9
+ If you need to use the tests we provide, please download the files in the `eval_my` directory on [GitHub](https://github.com/Efficient-ML/Qwen3-Quantization) and pay attention to the **"Attention"** section in the `README`:
10
+
11
+ - **Add eval_my directory**: Place the `eval_my` directory under the `GPTQ-for-Qwen` directory.
12
+
13
+ ## Load the model
14
+
15
+ ### Group-wise Quantization
16
+
17
+ #### 1. Perform GPTQ search
18
+
19
+ ```bash
20
+ CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
21
+ --wbits model_wbit --groupsize 128 \
22
+ --load path_of_.pth
23
+ ```
24
+
25
+ #### 2. Evaluate the quantized model
26
+
27
+ ```bash
28
+ CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
29
+ --wbits model_wbit --groupsize 128 \
30
+ --load path_of_.pth --eval
31
+ ```
32
+
33
+ ### Per-channel Quantization
34
+
35
+ #### 1. Perform GPTQ search
36
+
37
+ ```bash
38
+ CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
39
+ --wbits model_wbit --groupsize -1 \
40
+ --load path_of_.pth
41
+ ```
42
+
43
+ #### 2. Evaluate the quantized model
44
+
45
+ ```bash
46
+ CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
47
+ --wbits model_wbit --groupsize -1 \
48
+ --load path_of_.pth --eval
49
+ ```
50
+
51
+ ## Notes
52
+
53
+ - You need to input the corresponding `wbit` and `groupsize` parameters for the model; otherwise, loading errors may occur.
54
+ - Set the `groupsize` parameter to -1 for per-channel quantization.
55
+ - Make sure you have sufficient GPU memory to run a 32B-sized model
56
+ - Check [GitHub](https://github.com/Efficient-ML/Qwen3-Quantization) for more information.