Qwen3 1.7B GPTQ INT4
GPTQ 4-bit quantized version of Qwen/Qwen3-1.7B with group size 16.
Model Details
- Quantization: GPTQ INT4 with group size 16
- Size: ~1GB (4x compression from original)
- Format: W4A16 (4-bit weights, 16-bit activations)
- Compatible: Native transformers library support
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"2imi9/qwen3-1.7b-gptq-int4",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("2imi9/qwen3-1.7b-gptq-int4")
# Generate text
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Gradio Demo
import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("2imi9/qwen3-1.7b-gptq-int4", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("2imi9/qwen3-1.7b-gptq-int4")
def chat(message, history):
inputs = tokenizer(message, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
return response
gr.ChatInterface(chat).launch()
Perfect for Gradio demos due to small size and fast inference.
- Downloads last month
- 11