SpeechLMM v1
Collection
1st generation of SpeechLMM models, capable of ingesting video, audio and text and generate text as output. From the Meetween consortium (meetween.eu)
•
12 items
•
Updated
This is the version of meetween/Llama-speechlmm-1.0-l that was fine-tuned for Spoken Language Understanding.
Identical to base model. This model does not include a video adapter.
This model was obtained by fine-tuning the speech adapter and LoRA on the textdecoder. This repository contains the weights of LoRA merged into the main weights.
Identical to the base model.
The model was fine tuned on the same data sets used for training the main model.
Number of samples (hours): 40 (SLURP) + 25 (SpeechMassive)
= 65 in total
SpeechMassive (de) | SpeechMassive (fr) | SLURP (en) | |
---|---|---|---|
Base model | 84.6 | 86.6 | 78.1 |
SpeechLMM_v1.0_L_FT | 81.3 | 82.1 | 74.6 |
Transformers 4.45.0
Pytorch 2.3.1+cu124.post2
Datasets 3.2.0
Tokenizers 0.20.0