Guidelines for Loading Qwen3 (GPTQ) Quantized Models
Installation Setup
Download the GPTQ-for-Qwen_hf
folder.
File Replacement
If you need to use the tests we provide, please download the files in the eval_my
directory on GitHub and pay attention to the "Attention" section in the README
:
- Add eval_my directory: Place the
eval_my
directory under theGPTQ-for-Qwen
directory.
Load the model
Group-wise Quantization
1. Perform GPTQ search
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit --groupsize 128 \
--load path_of_.pth
2. Evaluate the quantized model
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit --groupsize 128 \
--load path_of_.pth --eval
Per-channel Quantization
1. Perform GPTQ search
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit --groupsize -1 \
--load path_of_.pth
2. Evaluate the quantized model
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
--wbits model_wbit --groupsize -1 \
--load path_of_.pth --eval
Notes
- You need to input the corresponding
wbit
andgroupsize
parameters for the model; otherwise, loading errors may occur. - Set the
groupsize
parameter to -1 for per-channel quantization. - Make sure you have sufficient GPU memory to run a 32B-sized model
- Check GitHub for more information.