VincentGOURBIN commited on
Commit
36ca3b0
·
verified ·
1 Parent(s): d41b74a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - mistralai/Voxtral-Small-24B-2507
5
+ tags:
6
+ - mistral
7
+ - quantized
8
+ - 8bit
9
+ - llm
10
+ - language-model
11
+ - transformers
12
+ - mlx
13
+ ---
14
+
15
+ # VincentGOURBIN/voxtral-small-8bit-mixed
16
+
17
+ This is an 8-bit quantized version of the [mistralai/Voxtral-Small-24B-2507](https://huggingface.co/mistralai/Voxtral-Small-24B-2507) language model.
18
+ It is provided in standard Hugging Face Transformers format and compatible with [mlx.voxtral](https://github.com/mzbac/mlx.voxtral).
19
+
20
+ ## 🔧 About this model
21
+
22
+ - **Base model**: [`mistralai/Voxtral-Small-24B-2507`](https://huggingface.co/mistralai/Voxtral-Small-24B-2507)
23
+ - **Quantization**: 8-bit mixed precision
24
+ - **Format**: Transformers-compatible (safetensors), usable with MLX and Hugging Face
25
+
26
+ ## 🙏 Acknowledgments
27
+
28
+ Huge thanks to:
29
+
30
+ - **[Mistral AI](https://mistral.ai/)** for releasing the original Voxtral-Small model
31
+ - **[mlx-voxtral](https://github.com/mzbac/mlx.voxtral)** for the quantization tooling and MLX support
32
+
33
+ This work is a quantized derivative of [mistralai/Voxtral-Small-24B-2507](https://huggingface.co/mistralai/Voxtral-Small-24B-2507), made easier by the amazing work of the `voxtral` project.
34
+
35
+ ## 🚀 Usage
36
+
37
+ ### 🤗 With Hugging Face Transformers
38
+
39
+ ```python
40
+ from transformers import AutoModelForCausalLM, AutoTokenizer
41
+
42
+ model_id = "VincentGOURBIN/voxtral-small-8bit-mixed"
43
+
44
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
45
+ model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
46
+
47
+ prompt = "What is the capital of France?"
48
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
49
+ outputs = model.generate(**inputs, max_new_tokens=50)
50
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))