Qwen3 1.7B GPTQ INT4

GPTQ 4-bit quantized version of Qwen/Qwen3-1.7B with group size 16.

Model Details

  • Quantization: GPTQ INT4 with group size 16
  • Size: ~1GB (4x compression from original)
  • Format: W4A16 (4-bit weights, 16-bit activations)
  • Compatible: Native transformers library support

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "2imi9/qwen3-1.7b-gptq-int4",
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("2imi9/qwen3-1.7b-gptq-int4")

# Generate text
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Gradio Demo

import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("2imi9/qwen3-1.7b-gptq-int4", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("2imi9/qwen3-1.7b-gptq-int4")

def chat(message, history):
    inputs = tokenizer(message, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
    response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
    return response

gr.ChatInterface(chat).launch()

Perfect for Gradio demos due to small size and fast inference.

Downloads last month
11
Safetensors
Model size
887M params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 2imi9/qwen3-1.7b-gptq-int4

Finetuned
Qwen/Qwen3-1.7B
Quantized
(113)
this model

Space using 2imi9/qwen3-1.7b-gptq-int4 1