File size: 1,724 Bytes
36ca3b0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
license: apache-2.0
base_model:
- mistralai/Voxtral-Small-24B-2507
tags:
- mistral
- quantized
- 8bit
- llm
- language-model
- transformers
- mlx
---
# VincentGOURBIN/voxtral-small-8bit-mixed
This is an 8-bit quantized version of the [mistralai/Voxtral-Small-24B-2507](https://huggingface.co/mistralai/Voxtral-Small-24B-2507) language model.
It is provided in standard Hugging Face Transformers format and compatible with [mlx.voxtral](https://github.com/mzbac/mlx.voxtral).
## π§ About this model
- **Base model**: [`mistralai/Voxtral-Small-24B-2507`](https://huggingface.co/mistralai/Voxtral-Small-24B-2507)
- **Quantization**: 8-bit mixed precision
- **Format**: Transformers-compatible (safetensors), usable with MLX and Hugging Face
## π Acknowledgments
Huge thanks to:
- **[Mistral AI](https://mistral.ai/)** for releasing the original Voxtral-Small model
- **[mlx-voxtral](https://github.com/mzbac/mlx.voxtral)** for the quantization tooling and MLX support
This work is a quantized derivative of [mistralai/Voxtral-Small-24B-2507](https://huggingface.co/mistralai/Voxtral-Small-24B-2507), made easier by the amazing work of the `voxtral` project.
## π Usage
### π€ With Hugging Face Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "VincentGOURBIN/voxtral-small-8bit-mixed"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |