metadata
library_name: transformers
license: apache-2.0
base_model: openai/whisper-base
tags:
- automatic-speech-recognition
- whisper
- urdu
- mozilla-foundation/common_voice_17_0
- hf-asr-leaderboard
datasets:
- mozilla-foundation/common_voice_17_0
metrics:
- wer
- cer
- bleu
- chrf
model-index:
- name: whisper-base-urdu-full
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: Common Voice 17.0 (Urdu)
type: mozilla-foundation/common_voice_17_0
config: ur
split: test
args: ur
metrics:
- type: wer
value: 39.124
name: WER
- type: cer
value: 14.781
name: CER
- type: bleu
value: 40.373
name: BLEU
- type: chrf
value: 69.624
name: ChrF
language:
- ur
pipeline_tag: automatic-speech-recognition
Whisper Base Urdu ASR Model
This model is a fine-tuned version of openai/whisper-base on the common_voice_17_0 dataset.
Usage
from transformers import pipeline
transcriber = pipeline(
"automatic-speech-recognition",
model="kingabzpro/whisper-base-urdu-full"
)
transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"
transcription = transcriber("audio2.mp3")
print(transcription)
{'text': 'دیکھیے پانی کپ تک بہتا اور مچھلی کپ تک تیرتی ہے'}
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 200
- training_steps: 1500
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.7511 | 0.5085 | 300 | 0.7027 | 47.9462 |
0.6138 | 1.0169 | 600 | 0.6070 | 44.5482 |
0.4602 | 1.5254 | 900 | 0.5756 | 41.2621 |
0.3916 | 2.0339 | 1200 | 0.5551 | 40.0672 |
0.3003 | 2.5424 | 1500 | 0.5551 | 41.6169 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.6.0
- Tokenizers 0.21.1
Evaluation
Urdu ASR Evaluation on Common Voice 17.0 (Test Split).
Metric | Value | Description |
---|---|---|
WER | 39.124% | Word Error Rate (lower is better) |
CER | 14.781% | Character Error Rate |
BLEU | 40.373% | BLEU Score (higher is better) |
ChrF | 69.624 | Character n-gram F-score |
👉 Review the testing script: Testing Whisper Base Urdu Full
Summary:
The high Word Error Rate (WER) of 39.12% is a significant weakness, indicating that nearly two out of every five words are transcribed incorrectly.
However, the model is much more effective at the character level. The moderate Character Error Rate (CER) of 14.78% and the strong ChrF score of 69.62 show that the system is good at predicting the correct sequence of characters, even if it struggles to form the complete, correct words.