kingabzpro's picture
Update README.md
6bfaec4 verified
metadata
library_name: transformers
license: apache-2.0
base_model: openai/whisper-base
tags:
  - automatic-speech-recognition
  - whisper
  - urdu
  - mozilla-foundation/common_voice_17_0
  - hf-asr-leaderboard
datasets:
  - mozilla-foundation/common_voice_17_0
metrics:
  - wer
  - cer
  - bleu
  - chrf
model-index:
  - name: whisper-base-urdu-full
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: Common Voice 17.0 (Urdu)
          type: mozilla-foundation/common_voice_17_0
          config: ur
          split: test
          args: ur
        metrics:
          - type: wer
            value: 39.124
            name: WER
          - type: cer
            value: 14.781
            name: CER
          - type: bleu
            value: 40.373
            name: BLEU
          - type: chrf
            value: 69.624
            name: ChrF
language:
  - ur
pipeline_tag: automatic-speech-recognition

Whisper Base Urdu ASR Model

This model is a fine-tuned version of openai/whisper-base on the common_voice_17_0 dataset.

Usage

from transformers import pipeline

transcriber = pipeline(
  "automatic-speech-recognition", 
  model="kingabzpro/whisper-base-urdu-full"
)

transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"

transcription = transcriber("audio2.mp3")
print(transcription)
{'text': 'دیکھیے پانی کپ تک بہتا اور مچھلی کپ تک تیرتی ہے'}

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 200
  • training_steps: 1500
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.7511 0.5085 300 0.7027 47.9462
0.6138 1.0169 600 0.6070 44.5482
0.4602 1.5254 900 0.5756 41.2621
0.3916 2.0339 1200 0.5551 40.0672
0.3003 2.5424 1500 0.5551 41.6169

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.6.0
  • Tokenizers 0.21.1

Evaluation

Urdu ASR Evaluation on Common Voice 17.0 (Test Split).

Metric Value Description
WER 39.124% Word Error Rate (lower is better)
CER 14.781% Character Error Rate
BLEU 40.373% BLEU Score (higher is better)
ChrF 69.624 Character n-gram F-score

👉 Review the testing script: Testing Whisper Base Urdu Full


Summary:
The high Word Error Rate (WER) of 39.12% is a significant weakness, indicating that nearly two out of every five words are transcribed incorrectly. However, the model is much more effective at the character level. The moderate Character Error Rate (CER) of 14.78% and the strong ChrF score of 69.62 show that the system is good at predicting the correct sequence of characters, even if it struggles to form the complete, correct words.