pythia-6.9b quantized to 4-bit using AutoGPTQ.

To use, first install AutoGPTQ:

pip install auto-gptq

Then load the model from the hub:

from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name = "smpanaro/pythia-6.9b-AutoGPTQ-4bit-128g"
model = AutoGPTQForCausalLM.from_quantized(model_name)
Model 4-Bit Perplexity 16-Bit Perplexity Delta
smpanaro/pythia-70m-AutoGPTQ-4bit-128g 49.125 - -
smpanaro/pythia-160m-AutoGPTQ-4bit-128g 33.4375 23.3024 10.1351
smpanaro/pythia-410m-AutoGPTQ-4bit-128g 21.4688 13.9838 7.485
smpanaro/pythia-1b-AutoGPTQ-4bit-128g 12.0391 11.6178 0.4213
smpanaro/pythia-1.4b-AutoGPTQ-4bit-128g 10.9609 10.4391 0.5218
smpanaro/pythia-2.8b-AutoGPTQ-4bit-128g 9.8281 9.0028 0.8253
smpanaro/pythia-6.9b-AutoGPTQ-4bit-128g 8.5078 8.2257 0.2821

Wikitext perplexity measured as in the huggingface docs, lower is better

Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train smpanaro/pythia-6.9b-AutoGPTQ-4bit-128g

Collection including smpanaro/pythia-6.9b-AutoGPTQ-4bit-128g