💬Urdu ASR Models
Collection
Collection of fine-tuned Urdu speech recognition models.
•
9 items
•
Updated
•
2
This model is a fine-tuned version of openai/whisper-base on the common_voice_17_0 dataset.
from transformers import pipeline
transcriber = pipeline(
"automatic-speech-recognition",
model="kingabzpro/whisper-base-urdu-full"
)
transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"
transcription = transcriber("audio2.mp3")
print(transcription)
{'text': 'دیکھیے پانی کپ تک بہتا اور مچھلی کپ تک تیرتی ہے'}
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.7511 | 0.5085 | 300 | 0.7027 | 47.9462 |
0.6138 | 1.0169 | 600 | 0.6070 | 44.5482 |
0.4602 | 1.5254 | 900 | 0.5756 | 41.2621 |
0.3916 | 2.0339 | 1200 | 0.5551 | 40.0672 |
0.3003 | 2.5424 | 1500 | 0.5551 | 41.6169 |
Urdu ASR Evaluation on Common Voice 17.0 (Test Split).
Metric | Value | Description |
---|---|---|
WER | 39.124% | Word Error Rate (lower is better) |
CER | 14.781% | Character Error Rate |
BLEU | 40.373% | BLEU Score (higher is better) |
ChrF | 69.624 | Character n-gram F-score |
👉 Review the testing script: Testing Whisper Base Urdu Full
Summary:
The high Word Error Rate (WER) of 39.12% is a significant weakness, indicating that nearly two out of every five words are transcribed incorrectly.
However, the model is much more effective at the character level. The moderate Character Error Rate (CER) of 14.78% and the strong ChrF score of 69.62 show that the system is good at predicting the correct sequence of characters, even if it struggles to form the complete, correct words.
Base model
openai/whisper-base