---
license: apache-2.0
base_model: openai/whisper-large-v3
tags:
- whisper
- automatic-speech-recognition
- speech
- audio
- arabic
- egyptian-arabic
- pytorch
- lora
- peft
language:
- ar
datasets:
- MightyStudent/Egyptian-ASR-MGB-3
metrics:
- wer
model-index:
- name: AbdelrahmanHassan/whisper-large-v3-egyptian-arabic
  results:
  - task:
      type: automatic-speech-recognition
      name: Automatic Speech Recognition
    dataset:
      name: Egyptian-ASR-MGB-3
      type: MightyStudent/Egyptian-ASR-MGB-3
    metrics:
    - type: wer
      value: 0.4739  # Will be filled from your evaluation
      name: Word Error Rate
---

# Whisper Large V3 Fine-tuned for Egyptian Arabic

This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the [Egyptian-ASR-MGB-3](https://huggingface.co/datasets/MightyStudent/Egyptian-ASR-MGB-3) dataset.

## Model Description

This model has been fine-tuned using LoRA (Low-Rank Adaptation) to improve automatic speech recognition performance on Egyptian Arabic dialect.

### Training Details

- **Base Model**: openai/whisper-large-v3
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Dataset**: Egyptian-ASR-MGB-3
- **Language**: Egyptian Arabic
- **Training Steps**: 100
- **Batch Size**: 1 (with gradient accumulation steps: 8)
- **Learning Rate**: 1e-4

### LoRA Configuration

- **Rank (r)**: 8
- **Alpha**: 32
- **Target Modules**: ["q_proj", "v_proj"]
- **Dropout**: 0.1

## Performance

- **Word Error Rate (WER)**: 0.4739

## Usage

```python
import torch
from transformers import WhisperProcessor, AutoModelForSpeechSeq2Seq
from peft import PeftModel
import librosa

# Load the model and processor
processor = WhisperProcessor.from_pretrained("AbdelrahmanHassan/whisper-large-v3-egyptian-arabic")
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    "openai/whisper-large-v3",
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    use_safetensors=True
)

# Load the LoRA adapter
model = PeftModel.from_pretrained(model, "AbdelrahmanHassan/whisper-large-v3-egyptian-arabic")

# Load and process audio
audio, sr = librosa.load("path_to_audio.wav", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

# Generate transcription
with torch.no_grad():
    predicted_ids = model.generate(input_features, max_length=225)
    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print(transcription)
```

## Training Procedure

### Training Data

The model was trained on the Egyptian-ASR-MGB-3 dataset, which contains Egyptian Arabic speech samples.

### Training Hyperparameters

- **Learning Rate**: 1e-4
- **Training Steps**: 100
- **Warmup Steps**: 5
- **Per Device Train Batch Size**: 1
- **Gradient Accumulation Steps**: 8
- **Generation Max Length**: 225
- **FP16/BF16**: Automatic detection based on hardware

### Framework Versions

- **Transformers**: Latest
- **Pytorch**: Latest
- **PEFT**: Latest
- **Datasets**: Latest

## Citation

If you use this model, please cite:

```bibtex
@misc{whisper-egyptian-arabic,
  title={Whisper Large V3 Fine-tuned for Egyptian Arabic},
  author={Your Name},
  year={2025},
  howpublished={\url{https://huggingface.co/AbdelrahmanHassan/whisper-large-v3-egyptian-arabic}}
}
```

## Limitations and Bias

This model is specifically fine-tuned for Egyptian Arabic dialect and may not perform well on other Arabic dialects or languages. The performance is dependent on the quality and diversity of the training data.