Quantized with GPTQModel 4.0.0 dev
with the following code:
from datasets import load_dataset, concatenate_datasets
from gptqmodel import GPTQModel, QuantizeConfig
from random import shuffle, seed
seed(0)
# 1. grab 512 English + 512 Spanish documents
en_ds = load_dataset("allenai/c4", data_files="en/c4-train.00001-of-01024.json.gz", split="train") \
.shuffle(seed=0).select(range(512))
es_ds = load_dataset("allenai/c4", data_files="multilingual/c4-es.tfrecord-00001-of-02048.json.gz", split="train") \
.shuffle(seed=0).select(range(512))
calib_texts = [x["text"] for x in concatenate_datasets([en_ds, es_ds])]
shuffle(calib_texts)
# 2. quantise
model_id = "deepcogito/cogito-v1-preview-qwen-14B"
quant_dir = "cogito-14b-gptq-q4"
qconf = QuantizeConfig(bits=4, group_size=128)
model = GPTQModel.load(model_id, qconf)
# model.quantize(calib_texts, batch_size=2)
model.quantize(calib_texts, batch_size=1)
model.save(quant_dir)
For the calibration dataset I included 50% spanish / 50% english for my tasks consiting primarily of these 2 languages.
Note: I tried the v2 but show much higher loss compared with v1, so sticking with that.
- Downloads last month
- 983
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support
Model tree for mediainbox/cogito-14b-gptq-q4
Base model
Qwen/Qwen2.5-14B