|
--- |
|
task_categories: |
|
- audio-classification |
|
language: |
|
- fr |
|
tags: |
|
- intent |
|
- intent-classification |
|
- audio-classification |
|
- audio |
|
base_model: |
|
- facebook/wav2vec2-xls-r-300m |
|
datasets: |
|
- FBK-MT/Speech-MASSIVE |
|
library_name: transformers |
|
license: apache-2.0 |
|
--- |
|
|
|
# wav2vec 2.0 XLS-R 128 (300m) fine-tuned on Speech-MASSIVE - fr-FR |
|
|
|
Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSIVE textual corpus. |
|
Speech-MASSIVE covers 12 languages. |
|
It includes spoken and written utterances and is annotated with 60 intents. |
|
The dataset is available on [HuggingFace Hub](https://huggingface.co/datasets/FBK-MT/Speech-MASSIVE). |
|
|
|
This is the [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) model fine-tuned on the fr-FR language. |
|
|
|
It achieves the following results on the test set: |
|
|
|
- Accuracy: 0.543 |
|
- F1: 0.410 |
|
|
|
## Usage |
|
|
|
You can use the model directly in the following manner: |
|
|
|
```python |
|
import torch |
|
import librosa |
|
from transformers import AutoModelForAudioClassification, AutoFeatureExtractor |
|
|
|
## Load an audio file |
|
audio_array, sr = librosa.load("path_to_audio.wav", sr=16000) |
|
|
|
## Load model and feature extractor |
|
model = AutoModelForAudioClassification.from_pretrained("alkiskoudounas/xls-r-128-speechmassive-fr-FR") |
|
feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-xls-r-300m") |
|
|
|
## Extract features |
|
inputs = feature_extractor(audio_array.squeeze(), sampling_rate=feature_extractor.sampling_rate, padding=True, return_tensors="pt") |
|
|
|
## Compute logits |
|
logits = model(**inputs).logits |
|
``` |
|
|
|
## Framework versions |
|
|
|
- Datasets 3.2.0 |
|
- Pytorch 2.1.2 |
|
- Tokenizers 0.20.3 |
|
- Transformers 4.45.2 |
|
|
|
## BibTeX entry and citation info |
|
|
|
```bibtex |
|
@inproceedings{koudounas2025unlearning, |
|
title={"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding}, |
|
author={Koudounas, Alkis and Savelli, Claudio and Giobergia, Flavio and Baralis, Elena}, |
|
booktitle={Proc. Interspeech 2025}, |
|
year={2025}, |
|
} |
|
``` |