mistralai
/

Voxtral-Small-24B-2507

Audio-Text-to-Text

Model card Files Files and versions

patrickvonplaten commited on Jul 15

Commit

a27970d

·

verified ·

1 Parent(s): c9a314d

Update README.md

Files changed (1) hide show

README.md +19 -6

README.md CHANGED Viewed

@@ -120,6 +120,12 @@ vllm serve mistralai/Voxtral-Small-24B-2507 --tokenizer_mode mistral --config_fo
 Leverage the audio capabilities of Voxtral-Small-24B-2507 to chat.
 <details>
   <summary>Python snippet</summary>
@@ -149,7 +155,7 @@ def file_to_chunk(file: str) -> AudioChunk:
     audio = Audio.from_file(file, strict=False)
     return AudioChunk.from_audio(audio)
-text_chunk = TextChunk(text="Which speaker do you prefer between the two? Why? How are they different from each other?")
 user_msg = UserMessage(content=[file_to_chunk(obama_file), file_to_chunk(bcn_file), text_chunk]).to_openai()
 print(30 * "=" + "USER 1" + 30 * "=")
@@ -167,11 +173,12 @@ content = response.choices[0].message.content
 print(30 * "=" + "BOT 1" + 30 * "=")
 print(content)
 print("\n\n")
-# E.g. The speaker who delivers the farewell address is more engaging and inspiring.
-# They express gratitude and optimism, emphasizing the importance of self-government and citizenship.
-# They also share personal experiences and observations, making the speech more relatable and heartfelt.
-# In contrast, the second speaker provides factual information about the weather in Barcelona,
-# which is less engaging and lacks the emotional depth of the first speaker's address.
 messages = [
     user_msg,
@@ -198,6 +205,12 @@ print(content)
 Voxtral-Small-24B-2507 has powerful transcription capabilities!
 <details>
   <summary>Python snippet</summary>

 Leverage the audio capabilities of Voxtral-Small-24B-2507 to chat.
+Make sure that your client has `mistral-common` with audio installed:
+```sh
+pip install --upgrade mistral_common[audio]
+```
 <details>
   <summary>Python snippet</summary>
     audio = Audio.from_file(file, strict=False)
     return AudioChunk.from_audio(audio)
+text_chunk = TextChunk(text="Which speaker is more inspiring? Why? How are they different from each other? Answer in French.")
 user_msg = UserMessage(content=[file_to_chunk(obama_file), file_to_chunk(bcn_file), text_chunk]).to_openai()
 print(30 * "=" + "USER 1" + 30 * "=")
 print(30 * "=" + "BOT 1" + 30 * "=")
 print(content)
 print("\n\n")
+# The model could give the following answer:
+# ```L'orateur le plus inspirant est le président.
+# Il est plus inspirant parce qu'il parle de ses expériences personnelles
+# et de son optimisme pour l'avenir du pays.
+# Il est différent de l'autre orateur car il ne parle pas de la météo,
+# mais plutôt de ses interactions avec les gens et de son rôle en tant que président.```
 messages = [
     user_msg,
 Voxtral-Small-24B-2507 has powerful transcription capabilities!
+Make sure that your client has `mistral-common` with audio installed:
+```sh
+pip install --upgrade mistral_common[audio]
+```
 <details>
   <summary>Python snippet</summary>