Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

Whisper Large V3 Fine-tuned for Egyptian Arabic

This model is a fine-tuned version of openai/whisper-large-v3 on the Egyptian-ASR-MGB-3 dataset.

Model Description

This model has been fine-tuned using LoRA (Low-Rank Adaptation) to improve automatic speech recognition performance on Egyptian Arabic dialect.

Training Details

  • Base Model: openai/whisper-large-v3
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Dataset: Egyptian-ASR-MGB-3
  • Language: Egyptian Arabic
  • Training Steps: 100
  • Batch Size: 1 (with gradient accumulation steps: 8)
  • Learning Rate: 1e-4

LoRA Configuration

  • Rank (r): 8
  • Alpha: 32
  • Target Modules: ["q_proj", "v_proj"]
  • Dropout: 0.1

Performance

  • Word Error Rate (WER): 0.4739

Usage

import torch
from transformers import WhisperProcessor, AutoModelForSpeechSeq2Seq
from peft import PeftModel
import librosa

# Load the model and processor
processor = WhisperProcessor.from_pretrained("AbdelrahmanHassan/whisper-large-v3-egyptian-arabic")
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    "openai/whisper-large-v3",
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    use_safetensors=True
)

# Load the LoRA adapter
model = PeftModel.from_pretrained(model, "AbdelrahmanHassan/whisper-large-v3-egyptian-arabic")

# Load and process audio
audio, sr = librosa.load("path_to_audio.wav", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

# Generate transcription
with torch.no_grad():
    predicted_ids = model.generate(input_features, max_length=225)
    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print(transcription)

Training Procedure

Training Data

The model was trained on the Egyptian-ASR-MGB-3 dataset, which contains Egyptian Arabic speech samples.

Training Hyperparameters

  • Learning Rate: 1e-4
  • Training Steps: 100
  • Warmup Steps: 5
  • Per Device Train Batch Size: 1
  • Gradient Accumulation Steps: 8
  • Generation Max Length: 225
  • FP16/BF16: Automatic detection based on hardware

Framework Versions

  • Transformers: Latest
  • Pytorch: Latest
  • PEFT: Latest
  • Datasets: Latest

Citation

If you use this model, please cite:

@misc{whisper-egyptian-arabic,
  title={Whisper Large V3 Fine-tuned for Egyptian Arabic},
  author={Your Name},
  year={2025},
  howpublished={\url{https://huggingface.co/AbdelrahmanHassan/whisper-large-v3-egyptian-arabic}}
}

Limitations and Bias

This model is specifically fine-tuned for Egyptian Arabic dialect and may not perform well on other Arabic dialects or languages. The performance is dependent on the quality and diversity of the training data.

Downloads last month
201
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AbdelrahmanHassan/whisper-large-v3-egyptian-arabic

Adapter
(133)
this model

Dataset used to train AbdelrahmanHassan/whisper-large-v3-egyptian-arabic

Evaluation results