Phi-4-mini-instruct - AWQ Quantized (4-bit)

This is a 4-bit AWQ quantized version of microsoft/Phi-4-mini-instruct.

Quantized using:

  • w_bit: 4
  • q_group_size: 128
  • zero_point: true

Usage

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model = AutoAWQForCausalLM.from_quantized("Sumo10/Phi-4-mini-instruct-AWQ-4bit")
tokenizer = AutoTokenizer.from_pretrained("Sumo10/Phi-4-mini-instruct-AWQ-4bit")

input_ids = tokenizer("What is quantum computing?", return_tensors="pt").input_ids
output = model.generate(input_ids)
print(tokenizer.decode(output[0]))
Downloads last month
52
Safetensors
Model size
4B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support