🧠 Phi-2 GPTQ (Quantized)

This repository provides a 4-bit GPTQ quantized version of the Phi-2 model by Microsoft, optimized for efficient inference using gptqmodel.

πŸ“Œ Model Details

  • Base Model: Microsoft Phi-2
  • Quantization: GPTQ (4-bit)
  • Quantizer: GPTQModel
  • Framework: PyTorch + HuggingFace Transformers
  • Device Support: CUDA (GPU)
  • License: Apache 2.0

πŸš€ Features

  • βœ… Lightweight: 4-bit quantization significantly reduces memory usage
  • βœ… Fast Inference: Ideal for deployment on consumer GPUs
  • βœ… Compatible: Works with transformers, optimum, and gptqmodel
  • βœ… CUDA-accelerated: Automatically uses GPU for speed

πŸ“š Usage

This model is ready-to-use with the Hugging Face transformers library.

πŸ§ͺ Intended Use

  • Research and development
  • Prototyping generative applications
  • Fast inference environments with limited GPU memory

πŸ“– References

βš–οΈ License

This model is distributed under the Apache License 2.0.

Downloads last month
5
Safetensors
Model size
918M params
Tensor type
I32
Β·
FP16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for STiFLeR7/Phi2-GPTQ

Base model

microsoft/phi-2
Quantized
(26)
this model