SpeechLMM v1
Collection
1st generation of SpeechLMM models, capable of ingesting video, audio and text and generate text as output. From the Meetween consortium (meetween.eu)
•
12 items
•
Updated
This is the version of meetween/Llama-speechlmm-1.0-l that was fine-tuned for Speech-to-Text Translation.
License: see LICENSE
Identical to the base model. The model was obtained by training LoRA on the LLM. This repository contains the model weights with LoRA merged into the main weights.
Identical to the base model.
This model has been fine-tuned on the same EuroParl-ST machine translation data ({en, fr, it, de, es} → {en, fr, it, de, es}) from the training data of the base model.
DATASET: |
FLORES |
ACL 60/60 |
AVG |
||||
BLEU |
en-de |
en-es |
en-it |
en-fr |
en-fr |
en-de |
|
Llama3-instruct (D5) |
28.1 |
24.4 |
25.0 |
41.2 |
48.8 |
34.2 |
33.6 |
NLLB (D5) |
39.4 |
23.7 |
31.2 |
50.7 |
59.1 |
45.2 |
41.6 |
SpeechLMM_v1.0_L |
29.4 |
22.3 |
20.1 |
31.9 |
35.5 |
32.8 |
28.7 |
Speech LMM v1.0_L-FT (LoRA) |
20.0 |
16.0 |
11.6 |
21.8 |
24.9 |
20.7 |
19.2 |
Base model
meetween/Llama-speechlmm-1.0-l