ModelCloud/GPTQ-v1-Llama-3.1-8B-Instruct

Simple Llama 3.1 8B-Instruct model quantized using GPTQ v1 with C2/en 256 rows of calibration data

This is not a production ready quant model but one used to evaluate GPTQ v1 vs GPTQ v2 for post-quant comparison.

GPTQ v2 is hosted at: https://huggingface.co/ModelCloud/GPTQ-v2-Llama-3.1-8B-Instruct

Eval Script using GPTQModel (main branch) and Marlin kernel + lm-eval (main branch)

# eval
from lm_eval.tasks import TaskManager
from lm_eval.utils import make_table

with tempfile.TemporaryDirectory() as tmp_dir:
    results = GPTQModel.eval(
        QUANT_SAVE_PATH,
        tasks=[EVAL.LM_EVAL.ARC_CHALLENGE, EVAL.LM_EVAL.GSM8K_PLATINUM_COT],
        apply_chat_template=True,
        random_seed=898,
        output_path= tmp_dir,
    )

    print(make_table(results))
    if "groups" in results:
        print(make_table(results, "groups"))

Full quantization and eval reproduction code: https://github.com/ModelCloud/GPTQModel/issues/1545#issuecomment-2811997133

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
arc_challenge	1	none	0	acc	↑	0.5000	±	0.0146
		none	0	acc_norm	↑	0.5128	±	0.0146
gsm8k_platinum_cot	3	flexible-extract	8	exact_match	↑	0.3995	±	0.0141
		strict-match	8	exact_match	↑	0.2548	±	0.0125

ModelCloud
/

GPTQ-v1-Llama-3.1-8B-Instruct

Simple Llama 3.1 8B-Instruct model quantized using GPTQ v1 with C2/en 256 rows of calibration data

Eval Script using GPTQModel (main branch) and Marlin kernel + lm-eval (main branch)

Model tree for ModelCloud/GPTQ-v1-Llama-3.1-8B-Instruct