VincentGOURBIN
/

voxtral-small-8bit

text2text-generation

8-bit precision

Model card Files Files and versions

VincentGOURBIN commited on 14 days ago

Commit

36ca3b0

·

verified ·

1 Parent(s): d41b74a

Create README.md

Files changed (1) hide show

README.md +50 -0

README.md ADDED Viewed

	@@ -0,0 +1,50 @@

+---
+license: apache-2.0
+base_model:
+  - mistralai/Voxtral-Small-24B-2507
+tags:
+  - mistral
+  - quantized
+  - 8bit
+  - llm
+  - language-model
+  - transformers
+  - mlx
+---
+# VincentGOURBIN/voxtral-small-8bit-mixed
+This is an 8-bit quantized version of the [mistralai/Voxtral-Small-24B-2507](https://huggingface.co/mistralai/Voxtral-Small-24B-2507) language model.
+It is provided in standard Hugging Face Transformers format and compatible with [mlx.voxtral](https://github.com/mzbac/mlx.voxtral).
+## 🔧 About this model
+- **Base model**: [`mistralai/Voxtral-Small-24B-2507`](https://huggingface.co/mistralai/Voxtral-Small-24B-2507)
+- **Quantization**: 8-bit mixed precision
+- **Format**: Transformers-compatible (safetensors), usable with MLX and Hugging Face
+## 🙏 Acknowledgments
+Huge thanks to:
+- **[Mistral AI](https://mistral.ai/)** for releasing the original Voxtral-Small model
+- **[mlx-voxtral](https://github.com/mzbac/mlx.voxtral)** for the quantization tooling and MLX support
+This work is a quantized derivative of [mistralai/Voxtral-Small-24B-2507](https://huggingface.co/mistralai/Voxtral-Small-24B-2507), made easier by the amazing work of the `voxtral` project.
+## 🚀 Usage
+### 🤗 With Hugging Face Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "VincentGOURBIN/voxtral-small-8bit-mixed"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
+prompt = "What is the capital of France?"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=50)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))