GPTQ-for-Qwen3 / README.md
HaoranChu's picture
Upload README.md
ea7061c verified
# Guidelines for Loading Qwen3 (GPTQ) Quantized Models
## Installation Setup
Download the `GPTQ-for-Qwen_hf` folder.
## File Replacement
If you need to use the tests we provide, please download the files in the `eval_my` directory on [GitHub](https://github.com/Efficient-ML/Qwen3-Quantization) and pay attention to the **"Attention"** section in the `README`:
- **Add eval_my directory**: Place the `eval_my` directory under the `GPTQ-for-Qwen` directory.
## Load the model
### Group-wise Quantization
#### 1. Perform GPTQ search
```bash
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit --groupsize 128 \
--load path_of_.pth
```
#### 2. Evaluate the quantized model
```bash
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit --groupsize 128 \
--load path_of_.pth --eval
```
### Per-channel Quantization
#### 1. Perform GPTQ search
```bash
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit --groupsize -1 \
--load path_of_.pth
```
#### 2. Evaluate the quantized model
```bash
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit --groupsize -1 \
--load path_of_.pth --eval
```
## Notes
- You need to input the corresponding `wbit` and `groupsize` parameters for the model; otherwise, loading errors may occur.
- Set the `groupsize` parameter to -1 for per-channel quantization.
- Make sure you have sufficient GPU memory to run a 32B-sized model
- Check [GitHub](https://github.com/Efficient-ML/Qwen3-Quantization) for more information.