🤫🇪🇬🌐 Whisper Small – Code-Switched Egyptian Arabic-English ASR

Model ID: IbrahimAmin/code-switched-egyptian-arabic-whisper-small
Base Model: openai/whisper-small
Languages: Egyptian Arabic, English (code-switched)
Author: Ibrahim Amin

🧠 Model Description

This model is a fine-tuned version of OpenAI's Whisper Small, optimized for Automatic Speech Recognition (ASR) on code-switched Egyptian Arabic-English audio. It is designed to accurately transcribe speech that alternates between Egyptian Arabic and English, a common occurrence in informal conversations, media, and social platforms within Egypt.

📚 Training Data

The model was trained on a diverse set of datasets to capture the nuances of code-switching:

MohamedRashad/arabic-english-code-switching: A dataset comprising 12.5k audio samples featuring spontaneous code-switched speech.
google/fleurs: Specifically, the ar_eg train subset was utilized to incorporate Egyptian Arabic speech patterns.
Custom YouTube Dataset: A curated collection of Egyptian Arabic-English code-switched audio from YouTube, enhancing the model's ability to handle real-world conversational scenarios.

🚀 Usage

To utilize this model for transcription tasks:

import torch
from transformers import pipeline

# Config
model_name = "IbrahimAmin/code-switched-egyptian-arabic-whisper-small"
torch_dtype = torch.float16
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the pipeline
asr = pipeline(task='automatic-speech-recognition',
                model=model_name, torch_dtype=torch_dtype, device=device)

path = "path_to_audio_file.wav"

# Inference
result = asr(path, return_timestamps=False, chunk_length_s=30,
              generate_kwargs={"task": "transcribe", "language": "<|ar|>", "num_beams": 5})

print(result['text'])

📊 Evaluation Metrics

The model's performance was evaluated using Word Error Rate (WER) across multiple test sets:

These results indicate the model's effectiveness in handling code-switched speech, particularly in the context of Egyptian Arabic.

Dataset	Baseline Whisper Small WER (%)	Fine-Tuned Model WER (%)
FLEURS ar_eg test set (transcription column)	30.29	24.36
ESCWA	98.15	45.12
MGB-3 (dev-test)	72.67 – 79.84	44.29 – 49.00
Common Voice 17.0 Arabic Subset (Test Set)	74.16	69.14

Whisper models were decoded using beam search (beam_size = 5) and evaluated using BasicTextNormalizer with remove_diacritics=False and split_letters=False, applied to both predictions and reference text.
MGB-3 dev/test sets WER% scores are MR-WER% scores calculated using this repo

✅ Intended Use

Primary: Transcription of code-switched Egyptian Arabic-English audio, including interviews, podcasts, and informal conversations.
Secondary: Research in sociolinguistics, code-switching phenomena, and development of multilingual ASR systems.

⚠️ Limitations

The model may exhibit reduced accuracy on monolingual speech or code-switching involving languages other than Egyptian Arabic and English.
Performance might vary with audio quality, speaker accents, and background noise.

📎 Citation

If you utilize this model in your research or applications, please cite it as follows:

@misc{amin2025whispercodeswitch,
  author = {Ibrahim Amin},
  title = {Whisper Small – Code-Switched Egyptian Arabic-English ASR},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/IbrahimAmin/code-switched-egyptian-arabic-whisper-small}}
}

IbrahimAmin
/

code-switched-egyptian-arabic-whisper-small