Quantized with GPTQModel 4.0.0 dev with the following code:

from datasets import load_dataset, concatenate_datasets
from gptqmodel import GPTQModel, QuantizeConfig
from random import shuffle, seed

seed(0)

# 1. grab 512 English + 512 Spanish documents
en_ds = load_dataset("allenai/c4", data_files="en/c4-train.00001-of-01024.json.gz", split="train") \
          .shuffle(seed=0).select(range(512))
es_ds = load_dataset("allenai/c4", data_files="multilingual/c4-es.tfrecord-00001-of-02048.json.gz", split="train") \
          .shuffle(seed=0).select(range(512))

calib_texts = [x["text"] for x in concatenate_datasets([en_ds, es_ds])]
shuffle(calib_texts)

# 2. quantise
model_id  = "deepcogito/cogito-v1-preview-qwen-14B"
quant_dir = "cogito-14b-gptq-q4"

qconf = QuantizeConfig(bits=4, group_size=128)
model = GPTQModel.load(model_id, qconf)

# model.quantize(calib_texts, batch_size=2)
model.quantize(calib_texts, batch_size=1)
model.save(quant_dir)

For the calibration dataset I included 50% spanish / 50% english for my tasks consiting primarily of these 2 languages.

Note: I tried the v2 but show much higher loss compared with v1, so sticking with that.

Downloads last month
983
Safetensors
Model size
3.32B params
Tensor type
I32
BF16
F16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for mediainbox/cogito-14b-gptq-q4

Base model

Qwen/Qwen2.5-14B
Quantized
(88)
this model