metadata

license: cc-by-4.0
language:
  - ur
base_model:
  - nvidia/parakeet-ctc-0.6b
pipeline_tag: automatic-speech-recognition

U-CTC

U-CTC is an Urdu automatic speech recognition (ASR) model based on the Parakeet-CTC-0.6B architecture. It has been fine-tuned on ~21 hours of Urdu speech data using the NVIDIA NeMo framework. The model is optimized for CTC-based transcription of spoken Urdu.

Model Summary

Model Name: U-CTC
Base Architecture: Parakeet-CTC-0.6B
Framework: NVIDIA NeMo
Language: Urdu
Model Type: Conformer Encoder + CTC Decoder
Loss Function: CTC Loss
Hardware: Trained on NVIDIA RTX 3090

Training Configuration

Setting	Value
Epochs	69
Max Steps	14,800
Optimizer	AdamW
Learning Rate	0.001
Betas	(0.9, 0.98)
Weight Decay	0.001
Scheduler	CosineAnnealing
Warmup Steps	15,000
Min LR	0.0001

Dataset

The model was trained and evaluated on a manually curated Urdu speech dataset:

Split	Files	Duration
Train	9,425	10.87 h
Validation	4,056	5.22 h
Test	4,056	5.22 h

Total audio hours: ~21.3 hours
Samples skipped due to CTC alignment failure: ~2.57%
Average AM sequence length: 50.39
Average target sequence length: 30.51
AM-to-target length ratio: ~1.83

Performance

Best Validation WER: 21.00%

Sample Predictions

Reference Sentence	U-CTC Output
پاکستان اور زمبابوے کے درمیان ون ڈے سیریز جمعہ سے شروع ہوگی	پاکستان اسزموبے تنا ونڈے سیریز جما کے رو ہوگی
بی بی سی نے بہت دیر کردی یہ چیز دکھانے میں	بیسینہ بات اغیر کردی ی ھی اس سکھایں
ٹھنڈی ٹھنڈی ہوا	ٹندی ٹھنڈی ہوہا
ایک اینڈ تو ولیمسن سنبھالے ہوئے تھے	یہ ک اندوسا تو محالے ہوئے تھے

mahwizzzz
/

U-CTC