JunHowie
/

Qwen3-4B-Instruct-2507-GPTQ-Int4

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

JunHowie commited on Sep 4

Commit

ae02ac1

·

verified ·

1 Parent(s): 5dcc479

Upload folder using huggingface_hub

Files changed (4) hide show

README.md +4 -2
config.json +1 -1
model.safetensors +1 -1
quantize_config.json +1 -1

README.md CHANGED Viewed

@@ -7,6 +7,7 @@ tags:
 - Qwen3
 - GPTQ
 - Int4
 - vLLM
 base_model:
   - Qwen/Qwen3-4B-Instruct-2507
@@ -16,6 +17,8 @@ base_model_relation: quantized
 Base model: [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
 <i>This model is quantized to 4-bit with a group size of 128.</i>
 ```
 vllm serve JunHowie/Qwen3-4B-Instruct-2507-GPTQ-Int4
@@ -26,7 +29,6 @@ vllm serve JunHowie/Qwen3-4B-Instruct-2507-GPTQ-Int4
 vllm>=0.9.2
 ```
 ### 【Model Download】
 ```python
@@ -238,4 +240,4 @@ If you find our work helpful, feel free to give us a cite.
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2505.09388},
 }
-```

 - Qwen3
 - GPTQ
 - Int4
+- 量化修复
 - vLLM
 base_model:
   - Qwen/Qwen3-4B-Instruct-2507
 Base model: [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
 <i>This model is quantized to 4-bit with a group size of 128.</i>
+<br>
+<i>Compared to earlier quantized versions, the new quantized model demonstrates better tokens/s efficiency. This improvement comes from setting desc_act=False in the quantization configuration.</i>
 ```
 vllm serve JunHowie/Qwen3-4B-Instruct-2507-GPTQ-Int4
 vllm>=0.9.2
 ```
 ### 【Model Download】
 ```python
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2505.09388},
 }
+```

config.json CHANGED Viewed

@@ -58,7 +58,7 @@
   "quantization_config": {
     "bits": 4,
     "checkpoint_format": "gptq",
-    "desc_act": true,
     "group_size": 128,
     "hyb_act": false,
     "lm_head": false,

   "quantization_config": {
     "bits": 4,
     "checkpoint_format": "gptq",
+    "desc_act": false,
     "group_size": 128,
     "hyb_act": false,
     "lm_head": false,

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1539086f9229a407a1acfb0838402210a3925ecd76a1a4788cc5f7c857791dc1
 size 2669888648

 version https://git-lfs.github.com/spec/v1
+oid sha256:b0deda4d7eeae0fb2ca621587603dcb476412abc38f0c3cb197f20c61cea42dd
 size 2669888648

quantize_config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "bits": 4,
   "group_size": 128,
-  "desc_act": true,
   "hyb_act": false,
   "sym": true,
   "lm_head": false,

 {
   "bits": 4,
   "group_size": 128,
+  "desc_act": false,
   "hyb_act": false,
   "sym": true,
   "lm_head": false,