---
library_name: transformers
license: apache-2.0
base_model: openai/whisper-base
tags:
- automatic-speech-recognition
- whisper
- urdu
- mozilla-foundation/common_voice_17_0
- hf-asr-leaderboard
datasets:
- mozilla-foundation/common_voice_17_0
metrics:
- wer
- cer
- bleu
- chrf
model-index:
- name: whisper-base-urdu-full
  results:
  - task:
      type: automatic-speech-recognition
      name: Automatic Speech Recognition
    dataset:
      name: Common Voice 17.0 (Urdu)
      type: mozilla-foundation/common_voice_17_0
      config: ur
      split: test
      args: ur
    metrics:
    - type: wer
      value: 39.124
      name: WER
    - type: cer
      value: 14.781
      name: CER
    - type: bleu
      value: 40.373
      name: BLEU
    - type: chrf
      value: 69.624
      name: ChrF
language:
- ur
pipeline_tag: automatic-speech-recognition
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Whisper Base Urdu ASR Model

This model is a fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on the common_voice_17_0 dataset.


## Usage

```python
from transformers import pipeline

transcriber = pipeline(
  "automatic-speech-recognition", 
  model="kingabzpro/whisper-base-urdu-full"
)

transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"

transcription = transcriber("audio2.mp3")
print(transcription)

```

```sh
{'text': 'دیکھیے پانی کپ تک بہتا اور مچھلی کپ تک تیرتی ہے'}
```


## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 200
- training_steps: 1500
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Wer     |
|:-------------:|:------:|:----:|:---------------:|:-------:|
| 0.7511        | 0.5085 | 300  | 0.7027          | 47.9462 |
| 0.6138        | 1.0169 | 600  | 0.6070          | 44.5482 |
| 0.4602        | 1.5254 | 900  | 0.5756          | 41.2621 |
| 0.3916        | 2.0339 | 1200 | 0.5551          | 40.0672 |
| 0.3003        | 2.5424 | 1500 | 0.5551          | 41.6169 |


### Framework versions

- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.6.0
- Tokenizers 0.21.1

## Evaluation


Urdu ASR Evaluation on Common Voice 17.0 (Test Split). 

| Metric | Value    | Description                        |
|--------|----------|------------------------------------|
| **WER**   | 39.124%  | Word Error Rate (lower is better) |
| **CER**   | 14.781%  | Character Error Rate              |
| **BLEU**  | 40.373%  | BLEU Score (higher is better)     |
| **ChrF**  | 69.624   | Character n-gram F-score          |

>👉 Review the testing script: [Testing Whisper Base Urdu Full](https://www.kaggle.com/code/kingabzpro/testing-whisper-base-urdu-full/notebook?scriptVersionId=249058345)

---

**Summary:**  
The high Word Error Rate (WER) of 39.12% is a significant weakness, indicating that nearly two out of every five words are transcribed incorrectly.
However, the model is much more effective at the character level. The moderate Character Error Rate (CER) of 14.78% and the strong ChrF score of 69.62 show that the system is good at predicting the correct sequence of characters, even if it struggles to form the complete, correct words.