Pixtral-12B-2409: 2:4 sparse

2:4 sparse version of mistral-community/pixtral-12b using kylesayrs/gptq-partition branch of LLM Compressor for optimised inference on VLLM.

Example VLLM usage

vllm serve nintwentydo/pixtral-12b-2409-2of4-sparse --max-model-len 131072 --limit-mm-per-prompt 'image=4' 

If you want a more advanced/fully featured chat template you can use this jinja template

Downloads last month
11
Safetensors
Model size
12.7B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for nintwentydo/pixtral-12b-2409-2of4-sparse

Quantized
(10)
this model