🀫πŸ‡ͺπŸ‡¬πŸŒ Whisper Small – Code-Switched Egyptian Arabic-English ASR

Model ID: IbrahimAmin/code-switched-egyptian-arabic-whisper-small
Base Model: openai/whisper-small
Languages: Egyptian Arabic, English (code-switched)
Author: Ibrahim Amin


🧠 Model Description

This model is a fine-tuned version of OpenAI's Whisper Small, optimized for Automatic Speech Recognition (ASR) on code-switched Egyptian Arabic-English audio. It is designed to accurately transcribe speech that alternates between Egyptian Arabic and English, a common occurrence in informal conversations, media, and social platforms within Egypt.


πŸ“š Training Data

The model was trained on a diverse set of datasets to capture the nuances of code-switching:

  • MohamedRashad/arabic-english-code-switching: A dataset comprising 12.5k audio samples featuring spontaneous code-switched speech.
  • google/fleurs: Specifically, the ar_eg train subset was utilized to incorporate Egyptian Arabic speech patterns.
  • Custom YouTube Dataset: A curated collection of Egyptian Arabic-English code-switched audio from YouTube, enhancing the model's ability to handle real-world conversational scenarios.

πŸš€ Usage

To utilize this model for transcription tasks:

import torch
from transformers import pipeline

# Config
model_name = "IbrahimAmin/code-switched-egyptian-arabic-whisper-small"
torch_dtype = torch.float16
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the pipeline
asr = pipeline(task='automatic-speech-recognition',
                model=model_name, torch_dtype=torch_dtype, device=device)

path = "path_to_audio_file.wav"

# Inference
result = asr(path, return_timestamps=False, chunk_length_s=30,
              generate_kwargs={"task": "transcribe", "language": "<|ar|>", "num_beams": 5})

print(result['text'])

πŸ“Š Evaluation Metrics

The model's performance was evaluated using Word Error Rate (WER) across multiple test sets:

These results indicate the model's effectiveness in handling code-switched speech, particularly in the context of Egyptian Arabic.

Dataset Baseline Whisper Small WER (%) Fine-Tuned Model WER (%)
FLEURS ar_eg test set (transcription column) 30.29 24.36
ESCWA 98.15 45.12
MGB-3 (dev-test) 72.67 – 79.84 44.29 – 49.00
Common Voice 17.0 Arabic Subset (Test Set) 74.16 69.14
  • Whisper models were decoded using beam search (beam_size = 5) and evaluated using BasicTextNormalizer with remove_diacritics=False and split_letters=False, applied to both predictions and reference text.
  • MGB-3 dev/test sets WER% scores are MR-WER% scores calculated using this repo

βœ… Intended Use

  • Primary: Transcription of code-switched Egyptian Arabic-English audio, including interviews, podcasts, and informal conversations.
  • Secondary: Research in sociolinguistics, code-switching phenomena, and development of multilingual ASR systems.

⚠️ Limitations

  • The model may exhibit reduced accuracy on monolingual speech or code-switching involving languages other than Egyptian Arabic and English.
  • Performance might vary with audio quality, speaker accents, and background noise.

πŸ“Ž Citation

If you utilize this model in your research or applications, please cite it as follows:

@misc{amin2025whispercodeswitch,
  author = {Ibrahim Amin},
  title = {Whisper Small – Code-Switched Egyptian Arabic-English ASR},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/IbrahimAmin/code-switched-egyptian-arabic-whisper-small}}
}
Downloads last month
11
Safetensors
Model size
242M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for IbrahimAmin/code-switched-egyptian-arabic-whisper-small

Finetuned
(2734)
this model

Datasets used to train IbrahimAmin/code-switched-egyptian-arabic-whisper-small