File size: 1,724 Bytes
36ca3b0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: apache-2.0
base_model:
  - mistralai/Voxtral-Small-24B-2507
tags:
  - mistral
  - quantized
  - 8bit
  - llm
  - language-model
  - transformers
  - mlx
---

# VincentGOURBIN/voxtral-small-8bit-mixed

This is an 8-bit quantized version of the [mistralai/Voxtral-Small-24B-2507](https://huggingface.co/mistralai/Voxtral-Small-24B-2507) language model.  
It is provided in standard Hugging Face Transformers format and compatible with [mlx.voxtral](https://github.com/mzbac/mlx.voxtral).

## πŸ”§ About this model

- **Base model**: [`mistralai/Voxtral-Small-24B-2507`](https://huggingface.co/mistralai/Voxtral-Small-24B-2507)
- **Quantization**: 8-bit mixed precision
- **Format**: Transformers-compatible (safetensors), usable with MLX and Hugging Face

## πŸ™ Acknowledgments

Huge thanks to:

- **[Mistral AI](https://mistral.ai/)** for releasing the original Voxtral-Small model  
- **[mlx-voxtral](https://github.com/mzbac/mlx.voxtral)** for the quantization tooling and MLX support

This work is a quantized derivative of [mistralai/Voxtral-Small-24B-2507](https://huggingface.co/mistralai/Voxtral-Small-24B-2507), made easier by the amazing work of the `voxtral` project.

## πŸš€ Usage

### πŸ€— With Hugging Face Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "VincentGOURBIN/voxtral-small-8bit-mixed"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))