--- library_name: transformers license: apache-2.0 base_model: openai/whisper-base tags: - automatic-speech-recognition - whisper - urdu - mozilla-foundation/common_voice_17_0 - hf-asr-leaderboard datasets: - mozilla-foundation/common_voice_17_0 metrics: - wer - cer - bleu - chrf model-index: - name: whisper-base-urdu-full results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Common Voice 17.0 (Urdu) type: mozilla-foundation/common_voice_17_0 config: ur split: test args: ur metrics: - type: wer value: 39.124 name: WER - type: cer value: 14.781 name: CER - type: bleu value: 40.373 name: BLEU - type: chrf value: 69.624 name: ChrF language: - ur pipeline_tag: automatic-speech-recognition --- # Whisper Base Urdu ASR Model This model is a fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on the common_voice_17_0 dataset. ## Usage ```python from transformers import pipeline transcriber = pipeline( "automatic-speech-recognition", model="kingabzpro/whisper-base-urdu-full" ) transcriber.model.generation_config.forced_decoder_ids = None transcriber.model.generation_config.language = "ur" transcription = transcriber("audio2.mp3") print(transcription) ``` ```sh {'text': 'دیکھیے پانی کپ تک بہتا اور مچھلی کپ تک تیرتی ہے'} ``` ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 200 - training_steps: 1500 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:------:|:----:|:---------------:|:-------:| | 0.7511 | 0.5085 | 300 | 0.7027 | 47.9462 | | 0.6138 | 1.0169 | 600 | 0.6070 | 44.5482 | | 0.4602 | 1.5254 | 900 | 0.5756 | 41.2621 | | 0.3916 | 2.0339 | 1200 | 0.5551 | 40.0672 | | 0.3003 | 2.5424 | 1500 | 0.5551 | 41.6169 | ### Framework versions - Transformers 4.51.3 - Pytorch 2.6.0+cu124 - Datasets 3.6.0 - Tokenizers 0.21.1 ## Evaluation Urdu ASR Evaluation on Common Voice 17.0 (Test Split). | Metric | Value | Description | |--------|----------|------------------------------------| | **WER** | 39.124% | Word Error Rate (lower is better) | | **CER** | 14.781% | Character Error Rate | | **BLEU** | 40.373% | BLEU Score (higher is better) | | **ChrF** | 69.624 | Character n-gram F-score | >👉 Review the testing script: [Testing Whisper Base Urdu Full](https://www.kaggle.com/code/kingabzpro/testing-whisper-base-urdu-full/notebook?scriptVersionId=249058345) --- **Summary:** The high Word Error Rate (WER) of 39.12% is a significant weakness, indicating that nearly two out of every five words are transcribed incorrectly. However, the model is much more effective at the character level. The moderate Character Error Rate (CER) of 14.78% and the strong ChrF score of 69.62 show that the system is good at predicting the correct sequence of characters, even if it struggles to form the complete, correct words.