Efficient-ML
/

GPTQ-for-Qwen3

Model card Files Files and versions

GPTQ-for-Qwen3 / README.md

HaoranChu's picture

Upload README.md

ea7061c verified 4 months ago

|

history blame contribute delete

1.59 kB

	# Guidelines for Loading Qwen3 (GPTQ) Quantized Models

	## Installation Setup

	Download the `GPTQ-for-Qwen_hf` folder.

	## File Replacement

	If you need to use the tests we provide, please download the files in the `eval_my` directory on [GitHub](https://github.com/Efficient-ML/Qwen3-Quantization) and pay attention to the "Attention" section in the `README`:

	- Add eval_my directory: Place the `eval_my` directory under the `GPTQ-for-Qwen` directory.

	## Load the model

	### Group-wise Quantization

	#### 1. Perform GPTQ search

	```bash
	CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
	--wbits model_wbit --groupsize 128 \
	--load path_of_.pth
	```

	#### 2. Evaluate the quantized model

	```bash
	CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
	--wbits model_wbit --groupsize 128 \
	--load path_of_.pth --eval
	```

	### Per-channel Quantization

	#### 1. Perform GPTQ search

	```bash
	CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
	--wbits model_wbit --groupsize -1 \
	--load path_of_.pth
	```

	#### 2. Evaluate the quantized model

	```bash
	CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path \
	--wbits model_wbit --groupsize -1 \
	--load path_of_.pth --eval
	```

	## Notes

	- You need to input the corresponding `wbit` and `groupsize` parameters for the model; otherwise, loading errors may occur.
	- Set the `groupsize` parameter to -1 for per-channel quantization.
	- Make sure you have sufficient GPU memory to run a 32B-sized model
	- Check [GitHub](https://github.com/Efficient-ML/Qwen3-Quantization) for more information.