Arabic Dialect Identifier Logo

Hugging Face πŸ€— space

arXiv πŸ“– paper

We present TamyΓ―z, an accurate and robust Transformer-based model for Arabic Dialect Identification (ADI) in speech. We adapt the pre-trained massively multilingual speech (MMS) model and fine-tune it on diverse Arabic TV broadcast speech to identify the following Arabic language varieties:

  • Modern Standard Arabic (MSA)
  • Egyptian Arabic (Masri and Sudani)
  • Gulf Arabic (Khleeji, Iraqi, and Yemeni)
  • Levantine Arabic (Shami)
  • Maghrebi Arabic (Dialects of al-Maghreb al-Arabi in North Africa)

Model Use Cases βš™οΈ

The model can be used as a component in a large-scale speech data collection pipeline to create resources for different Arabic dialects. It can also be used to filter speech data for Modern Standard Arabic (MSA) for text-to-speech (TTS) systems.

In Hugging Face πŸ€— Transformers library

Consider this speech segment as an example

Download Audio

Now we can use the model to identify the dialect of the speaker as follows

from transformers import pipeline

# Load the model
model_id = "badrex/mms-300m-arabic-dialect-identifier"
adi5_classifier = pipeline(
    "audio-classification", 
    model=model_id,
    device='cpu' # or device = 'cuda' if you are connected to a GPU
)

# Predict dialect for an audio sample 
audio_path = "https://huggingface.co/badrex/mms-300m-arabic-dialect-identifier/blob/main/examples/Da7ee7.mp3"

predictions = adi5_classifier(audio_path)

for pred in predictions:
    print(f"Dialect: {pred['label']:<10} Confidence: {pred['score']:.4f}")

For this example, you will get the following output

Dialect: Egyptian   Confidence: 0.9926
Dialect: MSA        Confidence: 0.0040
Dialect: Levantine  Confidence: 0.0033
Dialect: Maghrebi   Confidence: 0.0001
Dialect: Gulf       Confidence: 0.0000

Here, the model predicts the dialect correctly πŸ₯³

The model was trained to handle variation in recording environment and should do reasonably well on noisy speech segments. Consider this noisy speech segment from an old theatre recording

Download Audio

Using the model to make the prediciton as above, we get the following ouput

Dialect: MSA        Confidence: 0.9636
Dialect: Levantine  Confidence: 0.0319
Dialect: Egyptian   Confidence: 0.0023
Dialect: Gulf       Confidence: 0.0019
Dialect: Maghrebi   Confidence: 0.0003

Once again, the model makes the correct prediction πŸŽ‰

⚠️ Caution: Make sure your audio is sampled at 16kHz. If not, you should use librosa or torch to resample the audio.

Info ℹ️

  • Developed by: Badr M. Abdullah and Matthew Baas
  • Model type: wav2vec 2.0 architecture
  • Language: Arabic (and its varieties)
  • License: Creative Commons Attribution 4.0 (CC BY 4.0)
  • Finetuned from model: MMS-300m [https://huggingface.co/facebook/mms-300m]

Training Data πŸ›’οΈ

Trained on the MGB-3 ADI-5 dataset, which consists of TV Broadcast speech from Al Jazeera TV (news, interviews, discussions, TV shows, etc.)

Evaluation πŸ“ˆ

The model has been evaluated on the challenging multi-domain MADIS-5 benchmark. The model performed very well in our evaluation and is expected it to be robust to real-world speech samples.

Out-of-Scope Use β›”

The model should not be used to

  • Assess fluency or nativeness of speech
  • Determine whether the speaker uses a formal or informal register
  • Make judgments about a speaker's origin, education level, or socioeconomic status
  • Filter or discriminate against speakers based on dialect

Bias, Risks, and Limitations ⚠️

Some Arabic varieties are not well-represented in the training data. The model may not work well for some dialects such as Yemeni Arabic, Iraqi Arabic, and Saharan Arabic.

Additional limitations include:

  • Very short audio samples (< 2 seconds) may not provide enough information for accurate classification
  • Code-switching between dialects (specially mixing with MSA) may result in less reliable classifications
  • Speakers who have lived in multiple dialect regions may exhibit mixed features
  • Speech from non-typical speakers such as children and people with speech disorders might be challenging for the model

Recommendations πŸ‘Œ

  • For optimal results, use audio segments of at least 5-10 seconds
  • Confidence scores may not always be informative (e.g., the model could make a wrong decision but still very confident)
  • For critical applications, consider human verification of model predictions

Citation βœ’οΈ

If you use this dataset in your research, please cite our paper:

BibTeX:

@inproceedings{abdullah2025voice,
  title={Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification},
  author={Badr M. Abdullah and Matthew Baas and Bernd MΓΆbius and Dietrich Klakow},
  year={2025},
  publisher={Interspeech},
  url={https://arxiv.org/pdf/2505.24713}
}

Model Card Contact πŸ“§

If you have any questions, please do not hesitate to write an email to badr dot nlp at gmail dot com 😊

Downloads last month
309
Safetensors
Model size
316M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for badrex/mms-300m-arabic-dialect-identifier

Base model

facebook/mms-300m
Finetuned
(15)
this model

Space using badrex/mms-300m-arabic-dialect-identifier 1