Efficient-ML
/

GPTQ-for-Qwen3

Model card Files Files and versions

HaoranChu commited on May 10

Commit

ea7061c

·

verified ·

1 Parent(s): ddf2ebf

Upload README.md

Files changed (1) hide show

README.md +56 -0

README.md ADDED Viewed

	@@ -0,0 +1,56 @@

+# Guidelines for Loading Qwen3 (GPTQ) Quantized Models
+## Installation Setup
+Download the `GPTQ-for-Qwen_hf` folder.
+## File Replacement
+If you need to use the tests we provide, please download the files in the `eval_my` directory on [GitHub](https://github.com/Efficient-ML/Qwen3-Quantization) and pay attention to the **"Attention"** section in the `README`:
+- **Add eval_my directory**: Place the `eval_my` directory under the `GPTQ-for-Qwen` directory.
+## Load the model
+### Group-wise Quantization
+#### 1. Perform GPTQ search
+```bash
+CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
+--wbits model_wbit  --groupsize 128  \
+--load path_of_.pth
+```
+#### 2. Evaluate the quantized model
+```bash
+CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
+--wbits model_wbit  --groupsize 128  \
+--load path_of_.pth --eval
+```
+### Per-channel Quantization
+#### 1. Perform GPTQ search
+```bash
+CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
+--wbits model_wbit  --groupsize -1  \
+--load path_of_.pth
+```
+#### 2. Evaluate the quantized model
+```bash
+CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
+--wbits model_wbit  --groupsize -1  \
+--load path_of_.pth --eval
+```
+## Notes
+- You need to input the corresponding `wbit` and `groupsize` parameters for the model; otherwise, loading errors may occur.
+- Set the `groupsize` parameter to -1 for per-channel quantization.
+- Make sure you have sufficient GPU memory to run a 32B-sized model
+- Check [GitHub](https://github.com/Efficient-ML/Qwen3-Quantization) for more information.