--- license: apache-2.0 base_model: openai/whisper-large-v3 tags: - whisper - automatic-speech-recognition - speech - audio - arabic - egyptian-arabic - pytorch - lora - peft language: - ar datasets: - MightyStudent/Egyptian-ASR-MGB-3 metrics: - wer model-index: - name: AbdelrahmanHassan/whisper-large-v3-egyptian-arabic results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Egyptian-ASR-MGB-3 type: MightyStudent/Egyptian-ASR-MGB-3 metrics: - type: wer value: 0.4739 # Will be filled from your evaluation name: Word Error Rate --- # Whisper Large V3 Fine-tuned for Egyptian Arabic This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the [Egyptian-ASR-MGB-3](https://huggingface.co/datasets/MightyStudent/Egyptian-ASR-MGB-3) dataset. ## Model Description This model has been fine-tuned using LoRA (Low-Rank Adaptation) to improve automatic speech recognition performance on Egyptian Arabic dialect. ### Training Details - **Base Model**: openai/whisper-large-v3 - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) - **Dataset**: Egyptian-ASR-MGB-3 - **Language**: Egyptian Arabic - **Training Steps**: 100 - **Batch Size**: 1 (with gradient accumulation steps: 8) - **Learning Rate**: 1e-4 ### LoRA Configuration - **Rank (r)**: 8 - **Alpha**: 32 - **Target Modules**: ["q_proj", "v_proj"] - **Dropout**: 0.1 ## Performance - **Word Error Rate (WER)**: 0.4739 ## Usage ```python import torch from transformers import WhisperProcessor, AutoModelForSpeechSeq2Seq from peft import PeftModel import librosa # Load the model and processor processor = WhisperProcessor.from_pretrained("AbdelrahmanHassan/whisper-large-v3-egyptian-arabic") model = AutoModelForSpeechSeq2Seq.from_pretrained( "openai/whisper-large-v3", torch_dtype=torch.float16, low_cpu_mem_usage=True, use_safetensors=True ) # Load the LoRA adapter model = PeftModel.from_pretrained(model, "AbdelrahmanHassan/whisper-large-v3-egyptian-arabic") # Load and process audio audio, sr = librosa.load("path_to_audio.wav", sr=16000) input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features # Generate transcription with torch.no_grad(): predicted_ids = model.generate(input_features, max_length=225) transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0] print(transcription) ``` ## Training Procedure ### Training Data The model was trained on the Egyptian-ASR-MGB-3 dataset, which contains Egyptian Arabic speech samples. ### Training Hyperparameters - **Learning Rate**: 1e-4 - **Training Steps**: 100 - **Warmup Steps**: 5 - **Per Device Train Batch Size**: 1 - **Gradient Accumulation Steps**: 8 - **Generation Max Length**: 225 - **FP16/BF16**: Automatic detection based on hardware ### Framework Versions - **Transformers**: Latest - **Pytorch**: Latest - **PEFT**: Latest - **Datasets**: Latest ## Citation If you use this model, please cite: ```bibtex @misc{whisper-egyptian-arabic, title={Whisper Large V3 Fine-tuned for Egyptian Arabic}, author={Your Name}, year={2025}, howpublished={\url{https://huggingface.co/AbdelrahmanHassan/whisper-large-v3-egyptian-arabic}} } ``` ## Limitations and Bias This model is specifically fine-tuned for Egyptian Arabic dialect and may not perform well on other Arabic dialects or languages. The performance is dependent on the quality and diversity of the training data.