Mistral-Small-3.1-24B-Instruct-2503-GPTQ-4b-128g

Model Overview

This model was obtained by quantizing the weights of Mistral-Small-3.1-24B-Instruct-2503 to INT4 data type. This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%.

Only the weights of the linear operators within language_model transformers blocks are quantized. Vision model and multimodal projection are kept in original precision. Weights are quantized using a symmetric per-group scheme, with group size 128. The GPTQ algorithm is applied for quantization.

Model checkpoint is saved in compressed_tensors format.

Usage

  • To use the model in transformers update the package to stable release of Mistral-3

    pip install git+https://github.com/huggingface/[email protected]

  • To use the model in vLLM update the package to version vllm>=0.8.0.

Downloads last month
425
Safetensors
Model size
4.73B params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support image-text-to-text models for vllm library.

Model tree for ISTA-DASLab/Mistral-Small-3.1-24B-Instruct-2503-GPTQ-4b-128g