Whisper large V3 Turbo Urdu ASR Model 🥇

This model is a fine-tuned version of openai/whisper-large-v3-turbo on the common_voice_17_0 dataset.

It achieves the following results on the evaluation set:

  • Loss: 0.3534
  • Wer: 25.7842

Quick Usage

from transformers import pipeline

transcriber = pipeline(
  "automatic-speech-recognition", 
  model="kingabzpro/whisper-large-v3-turbo-urdu"
)

transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"

transcription = transcriber("audio2.mp3")
print(transcription)
{'text': 'دیکھیے پانی کب تک بہتا اور مچھلی کب تک تیرتی ہے'}

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1500

Training results

Training Loss Epoch Step Validation Loss Wer
0.6764 0.2545 300 0.6244 44.9776
0.5881 0.5089 600 0.5089 37.6214
0.4662 0.7634 900 0.4349 32.1322
0.3661 1.0178 1200 0.3634 26.5683
0.2293 1.2723 1500 0.3534 25.7842

Framework versions

  • Transformers 4.53.1
  • Pytorch 2.8.0.dev20250319+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.2

Evaluation

Urdu ASR Evaluation on Common Voice 17.0 (Test Split).

Metric Value Description
WER 26.234% Word Error Rate (lower is better)
CER 8.795% Character Error Rate
BLEU 58.032% BLEU Score (higher is better)
ChrF 81.636 Character n-gram F-score

👉 Review the testing script: Testing Whisper Large V3 Turbo Urdu

Summary

The Word Error Rate (WER) of 26.23% is respectable, indicating that roughly three out of every four words are transcribed correctly. While there is room for improvement, this is a functional level of accuracy.

The model excels at the character level, with a low Character Error Rate (CER) of 8.80% and a very high ChrF score of 81.64. This shows it accurately captures the phonetic and structural details of the language. The high BLEU score further confirms that the generated transcriptions are coherent and closely match the reference text.

In summary, this is a high-performing and reliable ASR system that produces largely accurate and intelligible transcriptions.

Downloads last month
3,780
Safetensors
Model size
809M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kingabzpro/whisper-large-v3-turbo-urdu

Finetuned
(327)
this model

Dataset used to train kingabzpro/whisper-large-v3-turbo-urdu

Space using kingabzpro/whisper-large-v3-turbo-urdu 1

Collections including kingabzpro/whisper-large-v3-turbo-urdu

Evaluation results