|
# Guidelines for Loading Qwen3 (GPTQ) Quantized Models |
|
|
|
## Installation Setup |
|
|
|
Download the `GPTQ-for-Qwen_hf` folder. |
|
|
|
## File Replacement |
|
|
|
If you need to use the tests we provide, please download the files in the `eval_my` directory on [GitHub](https://github.com/Efficient-ML/Qwen3-Quantization) and pay attention to the **"Attention"** section in the `README`: |
|
|
|
- **Add eval_my directory**: Place the `eval_my` directory under the `GPTQ-for-Qwen` directory. |
|
|
|
## Load the model |
|
|
|
### Group-wise Quantization |
|
|
|
#### 1. Perform GPTQ search |
|
|
|
```bash |
|
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \ |
|
--wbits model_wbit --groupsize 128 \ |
|
--load path_of_.pth |
|
``` |
|
|
|
#### 2. Evaluate the quantized model |
|
|
|
```bash |
|
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \ |
|
--wbits model_wbit --groupsize 128 \ |
|
--load path_of_.pth --eval |
|
``` |
|
|
|
### Per-channel Quantization |
|
|
|
#### 1. Perform GPTQ search |
|
|
|
```bash |
|
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \ |
|
--wbits model_wbit --groupsize -1 \ |
|
--load path_of_.pth |
|
``` |
|
|
|
#### 2. Evaluate the quantized model |
|
|
|
```bash |
|
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \ |
|
--wbits model_wbit --groupsize -1 \ |
|
--load path_of_.pth --eval |
|
``` |
|
|
|
## Notes |
|
|
|
- You need to input the corresponding `wbit` and `groupsize` parameters for the model; otherwise, loading errors may occur. |
|
- Set the `groupsize` parameter to -1 for per-channel quantization. |
|
- Make sure you have sufficient GPU memory to run a 32B-sized model |
|
- Check [GitHub](https://github.com/Efficient-ML/Qwen3-Quantization) for more information. |