AbdelrahmanHassan's picture
Upload fine-tuned Whisper model for Egyptian Arabic
0d6555d verified
metadata
license: apache-2.0
base_model: openai/whisper-large-v3
tags:
  - whisper
  - automatic-speech-recognition
  - speech
  - audio
  - arabic
  - egyptian-arabic
  - pytorch
  - lora
  - peft
language:
  - ar
datasets:
  - MightyStudent/Egyptian-ASR-MGB-3
metrics:
  - wer
model-index:
  - name: AbdelrahmanHassan/whisper-large-v3-egyptian-arabic
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: Egyptian-ASR-MGB-3
          type: MightyStudent/Egyptian-ASR-MGB-3
        metrics:
          - type: wer
            value: 0.4739
            name: Word Error Rate

Whisper Large V3 Fine-tuned for Egyptian Arabic

This model is a fine-tuned version of openai/whisper-large-v3 on the Egyptian-ASR-MGB-3 dataset.

Model Description

This model has been fine-tuned using LoRA (Low-Rank Adaptation) to improve automatic speech recognition performance on Egyptian Arabic dialect.

Training Details

  • Base Model: openai/whisper-large-v3
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Dataset: Egyptian-ASR-MGB-3
  • Language: Egyptian Arabic
  • Training Steps: 100
  • Batch Size: 1 (with gradient accumulation steps: 8)
  • Learning Rate: 1e-4

LoRA Configuration

  • Rank (r): 8
  • Alpha: 32
  • Target Modules: ["q_proj", "v_proj"]
  • Dropout: 0.1

Performance

  • Word Error Rate (WER): 0.4739

Usage

import torch
from transformers import WhisperProcessor, AutoModelForSpeechSeq2Seq
from peft import PeftModel
import librosa

# Load the model and processor
processor = WhisperProcessor.from_pretrained("AbdelrahmanHassan/whisper-large-v3-egyptian-arabic")
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    "openai/whisper-large-v3",
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    use_safetensors=True
)

# Load the LoRA adapter
model = PeftModel.from_pretrained(model, "AbdelrahmanHassan/whisper-large-v3-egyptian-arabic")

# Load and process audio
audio, sr = librosa.load("path_to_audio.wav", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

# Generate transcription
with torch.no_grad():
    predicted_ids = model.generate(input_features, max_length=225)
    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print(transcription)

Training Procedure

Training Data

The model was trained on the Egyptian-ASR-MGB-3 dataset, which contains Egyptian Arabic speech samples.

Training Hyperparameters

  • Learning Rate: 1e-4
  • Training Steps: 100
  • Warmup Steps: 5
  • Per Device Train Batch Size: 1
  • Gradient Accumulation Steps: 8
  • Generation Max Length: 225
  • FP16/BF16: Automatic detection based on hardware

Framework Versions

  • Transformers: Latest
  • Pytorch: Latest
  • PEFT: Latest
  • Datasets: Latest

Citation

If you use this model, please cite:

@misc{whisper-egyptian-arabic,
  title={Whisper Large V3 Fine-tuned for Egyptian Arabic},
  author={Your Name},
  year={2025},
  howpublished={\url{https://huggingface.co/AbdelrahmanHassan/whisper-large-v3-egyptian-arabic}}
}

Limitations and Bias

This model is specifically fine-tuned for Egyptian Arabic dialect and may not perform well on other Arabic dialects or languages. The performance is dependent on the quality and diversity of the training data.