This is a Mistral-Small-3.1-24B-Instruct-2503 quantized from a hacked-up GPTQModel that has preliminary Mistral3ForConditionalGeneration support. There were several weird changes. Calibration was run against the flickr30k dataset (with too few samples; may upload a version with more significant calibration soon), and thus this should be a true vision-aware quant of the Mistral Small 3.1 HF checkpoint.

You need this branch of vLLM to run: https://github.com/sjuxax/vllm/tree/Mistral3.1

Another "feature" of this version is that it was quantized with a preliminary implementation of block-diagonal Hessians (which was authored entirely by Grok3). This allowed me to compute the quantization without OOM in my 24G VRAM.

Downloads last month
32
Safetensors
Model size
4.76B params
Tensor type
I64
I32
FP16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for jeffcookio/Mistral-Small-3.1-24B-Instruct-2503-HF-gptqmodel-4b-128g