Whisper large V3 Turbo Urdu ASR Model 🥇
This model is a fine-tuned version of openai/whisper-large-v3-turbo on the common_voice_17_0 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.3534
- Wer: 25.7842
Quick Usage
from transformers import pipeline
transcriber = pipeline(
"automatic-speech-recognition",
model="kingabzpro/whisper-large-v3-turbo-urdu"
)
transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"
transcription = transcriber("audio2.mp3")
print(transcription)
{'text': 'دیکھیے پانی کب تک بہتا اور مچھلی کب تک تیرتی ہے'}
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1500
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.6764 | 0.2545 | 300 | 0.6244 | 44.9776 |
0.5881 | 0.5089 | 600 | 0.5089 | 37.6214 |
0.4662 | 0.7634 | 900 | 0.4349 | 32.1322 |
0.3661 | 1.0178 | 1200 | 0.3634 | 26.5683 |
0.2293 | 1.2723 | 1500 | 0.3534 | 25.7842 |
Framework versions
- Transformers 4.53.1
- Pytorch 2.8.0.dev20250319+cu128
- Datasets 3.6.0
- Tokenizers 0.21.2
Evaluation
Urdu ASR Evaluation on Common Voice 17.0 (Test Split).
Metric | Value | Description |
---|---|---|
WER | 26.234% | Word Error Rate (lower is better) |
CER | 8.795% | Character Error Rate |
BLEU | 58.032% | BLEU Score (higher is better) |
ChrF | 81.636 | Character n-gram F-score |
👉 Review the testing script: Testing Whisper Large V3 Turbo Urdu
Summary
The Word Error Rate (WER) of 26.23% is respectable, indicating that roughly three out of every four words are transcribed correctly. While there is room for improvement, this is a functional level of accuracy.
The model excels at the character level, with a low Character Error Rate (CER) of 8.80% and a very high ChrF score of 81.64. This shows it accurately captures the phonetic and structural details of the language. The high BLEU score further confirms that the generated transcriptions are coherent and closely match the reference text.
In summary, this is a high-performing and reliable ASR system that produces largely accurate and intelligible transcriptions.
- Downloads last month
- 3,780
Model tree for kingabzpro/whisper-large-v3-turbo-urdu
Base model
openai/whisper-large-v3Dataset used to train kingabzpro/whisper-large-v3-turbo-urdu
Space using kingabzpro/whisper-large-v3-turbo-urdu 1
Collections including kingabzpro/whisper-large-v3-turbo-urdu
Evaluation results
- WER on Common Voice 17.0 (Urdu)test set self-reported26.234
- CER on Common Voice 17.0 (Urdu)test set self-reported8.795
- BLEU on Common Voice 17.0 (Urdu)test set self-reported58.032
- ChrF on Common Voice 17.0 (Urdu)test set self-reported81.636