This is a Mistral-Small-3.1-24B-Instruct-2503 quantized from a hacked-up GPTQModel that has preliminary Mistral3ForConditionalGeneration support. There were several weird changes. Calibration was run against the flickr30k dataset (with too few samples; may upload a version with more significant calibration soon), and thus this should be a true vision-aware quant of the Mistral Small 3.1 HF checkpoint.

You need this branch of vLLM to run: https://github.com/sjuxax/vllm/tree/Mistral3.1

Another "feature" of this version is that it was quantized with a preliminary implementation of block-diagonal Hessians (which was authored entirely by Grok3). This allowed me to compute the quantization without OOM in my 24G VRAM.

Downloads last month
67
Safetensors
Model size
4.76B params
Tensor type
I64
I32
FP16
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for jeffcookio/Mistral-Small-3.1-24B-Instruct-2503-HF-gptqmodel-4b-128g