Phi-4 GPTQ (4-bit Quantized)

Model

Model Description

This is a 4-bit quantized version of the Phi-4 transformer model, optimized for efficient inference while maintaining performance.

  • Base Model: Phi-4
  • Quantization: bnb (4-bit)
  • Format: safetensors
  • Tokenizer: Uses standard vocab.json and merges.txt

Intended Use

  • Fast inference with minimal VRAM usage
  • Deployment in resource-constrained environments
  • Optimized for low-latency text generation

Model Details

Attribute Value
Model Name Phi-4 GPTQ
Quantization 4-bit (GPTQ)
File Format .safetensors
Tokenizer phi-4-tokenizer.json
VRAM Usage ~X GB (depending on batch size)
Downloads last month
10
Safetensors
Model size
8.06B params
Tensor type
F32
FP16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for fhamborg/phi-4-4bit-bnb

Base model

microsoft/phi-4
Quantized
(120)
this model