π€«πͺπ¬π Whisper Small β Code-Switched Egyptian Arabic-English ASR
Model ID: IbrahimAmin/code-switched-egyptian-arabic-whisper-small
Base Model: openai/whisper-small
Languages: Egyptian Arabic, English (code-switched)
Author: Ibrahim Amin
π§ Model Description
This model is a fine-tuned version of OpenAI's Whisper Small, optimized for Automatic Speech Recognition (ASR) on code-switched Egyptian Arabic-English audio. It is designed to accurately transcribe speech that alternates between Egyptian Arabic and English, a common occurrence in informal conversations, media, and social platforms within Egypt.
π Training Data
The model was trained on a diverse set of datasets to capture the nuances of code-switching:
- MohamedRashad/arabic-english-code-switching: A dataset comprising 12.5k audio samples featuring spontaneous code-switched speech.
- google/fleurs: Specifically, the
ar_eg
train subset was utilized to incorporate Egyptian Arabic speech patterns. - Custom YouTube Dataset: A curated collection of Egyptian Arabic-English code-switched audio from YouTube, enhancing the model's ability to handle real-world conversational scenarios.
π Usage
To utilize this model for transcription tasks:
import torch
from transformers import pipeline
# Config
model_name = "IbrahimAmin/code-switched-egyptian-arabic-whisper-small"
torch_dtype = torch.float16
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load the pipeline
asr = pipeline(task='automatic-speech-recognition',
model=model_name, torch_dtype=torch_dtype, device=device)
path = "path_to_audio_file.wav"
# Inference
result = asr(path, return_timestamps=False, chunk_length_s=30,
generate_kwargs={"task": "transcribe", "language": "<|ar|>", "num_beams": 5})
print(result['text'])
π Evaluation Metrics
The model's performance was evaluated using Word Error Rate (WER) across multiple test sets:
These results indicate the model's effectiveness in handling code-switched speech, particularly in the context of Egyptian Arabic.
Dataset | Baseline Whisper Small WER (%) | Fine-Tuned Model WER (%) |
---|---|---|
FLEURS ar_eg test set (transcription column) | 30.29 | 24.36 |
ESCWA | 98.15 | 45.12 |
MGB-3 (dev-test) | 72.67 β 79.84 | 44.29 β 49.00 |
Common Voice 17.0 Arabic Subset (Test Set) | 74.16 | 69.14 |
- Whisper models were decoded using beam search (
beam_size = 5
) and evaluated usingBasicTextNormalizer
withremove_diacritics=False
andsplit_letters=False
, applied to both predictions and reference text. - MGB-3 dev/test sets WER% scores are MR-WER% scores calculated using this repo
β Intended Use
- Primary: Transcription of code-switched Egyptian Arabic-English audio, including interviews, podcasts, and informal conversations.
- Secondary: Research in sociolinguistics, code-switching phenomena, and development of multilingual ASR systems.
β οΈ Limitations
- The model may exhibit reduced accuracy on monolingual speech or code-switching involving languages other than Egyptian Arabic and English.
- Performance might vary with audio quality, speaker accents, and background noise.
π Citation
If you utilize this model in your research or applications, please cite it as follows:
@misc{amin2025whispercodeswitch,
author = {Ibrahim Amin},
title = {Whisper Small β Code-Switched Egyptian Arabic-English ASR},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/IbrahimAmin/code-switched-egyptian-arabic-whisper-small}}
}
- Downloads last month
- 11
Model tree for IbrahimAmin/code-switched-egyptian-arabic-whisper-small
Base model
openai/whisper-small