Whisper large V3 Turbo Urdu ASR Model 🥇

This model is a fine-tuned version of openai/whisper-large-v3-turbo on the common_voice_17_0 dataset.

It achieves the following results on the evaluation set:

Loss: 0.3534
Wer: 25.7842

Quick Usage

from transformers import pipeline

transcriber = pipeline(
  "automatic-speech-recognition", 
  model="kingabzpro/whisper-large-v3-turbo-urdu"
)

transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"

transcription = transcriber("audio2.mp3")
print(transcription)

{'text': 'دیکھیے پانی کب تک بہتا اور مچھلی کب تک تیرتی ہے'}

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 4
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1500

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.6764	0.2545	300	0.6244	44.9776
0.5881	0.5089	600	0.5089	37.6214
0.4662	0.7634	900	0.4349	32.1322
0.3661	1.0178	1200	0.3634	26.5683
0.2293	1.2723	1500	0.3534	25.7842

Framework versions

Transformers 4.53.1
Pytorch 2.8.0.dev20250319+cu128
Datasets 3.6.0
Tokenizers 0.21.2

Evaluation

Urdu ASR Evaluation on Common Voice 17.0 (Test Split).

Metric	Value	Description
WER	26.234%	Word Error Rate (lower is better)
CER	8.795%	Character Error Rate
BLEU	58.032%	BLEU Score (higher is better)
ChrF	81.636	Character n-gram F-score

👉 Review the testing script: Testing Whisper Large V3 Turbo Urdu

Summary

The Word Error Rate (WER) of 26.23% is respectable, indicating that roughly three out of every four words are transcribed correctly. While there is room for improvement, this is a functional level of accuracy.

The model excels at the character level, with a low Character Error Rate (CER) of 8.80% and a very high ChrF score of 81.64. This shows it accurately captures the phonetic and structural details of the language. The high BLEU score further confirms that the generated transcriptions are coherent and closely match the reference text.

In summary, this is a high-performing and reliable ASR system that produces largely accurate and intelligible transcriptions.