kingabzpro
/

whisper-base-urdu-full

Automatic Speech Recognition

mozilla-foundation/common_voice_17_0

hf-asr-leaderboard

Model card Files Files and versions

Metrics Training metrics Community

whisper-base-urdu-full / README.md

kingabzpro's picture

Update README.md

6bfaec4 verified about 1 month ago

|

history blame contribute delete

3.76 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: openai/whisper-base
	tags:
	- automatic-speech-recognition
	- whisper
	- urdu
	- mozilla-foundation/common_voice_17_0
	- hf-asr-leaderboard
	datasets:
	- mozilla-foundation/common_voice_17_0
	metrics:
	- wer
	- cer
	- bleu
	- chrf
	model-index:
	- name: whisper-base-urdu-full
	results:
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: Common Voice 17.0 (Urdu)
	type: mozilla-foundation/common_voice_17_0
	config: ur
	split: test
	args: ur
	metrics:
	- type: wer
	value: 39.124
	name: WER
	- type: cer
	value: 14.781
	name: CER
	- type: bleu
	value: 40.373
	name: BLEU
	- type: chrf
	value: 69.624
	name: ChrF
	language:
	- ur
	pipeline_tag: automatic-speech-recognition
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Whisper Base Urdu ASR Model

	This model is a fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on the common_voice_17_0 dataset.


	## Usage

	```python
	from transformers import pipeline

	transcriber = pipeline(
	"automatic-speech-recognition",
	model="kingabzpro/whisper-base-urdu-full"
	)

	transcriber.model.generation_config.forced_decoder_ids = None
	transcriber.model.generation_config.language = "ur"

	transcription = transcriber("audio2.mp3")
	print(transcription)

	```

	```sh
	{'text': 'دیکھیے پانی کپ تک بہتا اور مچھلی کپ تک تیرتی ہے'}
	```


	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 200
	- training_steps: 1500
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:-------:\|
	\| 0.7511 \| 0.5085 \| 300 \| 0.7027 \| 47.9462 \|
	\| 0.6138 \| 1.0169 \| 600 \| 0.6070 \| 44.5482 \|
	\| 0.4602 \| 1.5254 \| 900 \| 0.5756 \| 41.2621 \|
	\| 0.3916 \| 2.0339 \| 1200 \| 0.5551 \| 40.0672 \|
	\| 0.3003 \| 2.5424 \| 1500 \| 0.5551 \| 41.6169 \|


	### Framework versions

	- Transformers 4.51.3
	- Pytorch 2.6.0+cu124
	- Datasets 3.6.0
	- Tokenizers 0.21.1

	## Evaluation


	Urdu ASR Evaluation on Common Voice 17.0 (Test Split).

	\| Metric \| Value \| Description \|
	\|--------\|----------\|------------------------------------\|
	\| WER \| 39.124% \| Word Error Rate (lower is better) \|
	\| CER \| 14.781% \| Character Error Rate \|
	\| BLEU \| 40.373% \| BLEU Score (higher is better) \|
	\| ChrF \| 69.624 \| Character n-gram F-score \|

	>👉 Review the testing script: [Testing Whisper Base Urdu Full](https://www.kaggle.com/code/kingabzpro/testing-whisper-base-urdu-full/notebook?scriptVersionId=249058345)

	---

	Summary:
	The high Word Error Rate (WER) of 39.12% is a significant weakness, indicating that nearly two out of every five words are transcribed incorrectly.
	However, the model is much more effective at the character level. The moderate Character Error Rate (CER) of 14.78% and the strong ChrF score of 69.62 show that the system is good at predicting the correct sequence of characters, even if it struggles to form the complete, correct words.