Whisper Small fine-tuned for Kannada using IAST Romanization via Aksharamukha, addressing token limits in non-Roman scripts.
This is a Whisper Small model fine-tuned for Kannada automatic speech recognition (ASR). The original model has a token limit of 448, which makes it less efficient for non-Roman scripts like Kannada. To improve this, we fine-tuned the model with Romanized Kannada text using IAST (International Alphabet of Sanskrit Transliteration), generated via the Aksharamukha library. This approach has reduced sequence lengths, resulting in a 2x faster inference speed.
Example:
Kannada Tokens: ['à²', '¹', 'à²', '¾', 'à²', '°', 'à³į', 'à²', '¦', 'à²', '¿', 'à²', 'ķ', 'Ġà²', '¶', 'à³', 'ģ', 'à²', 'Ń', 'à²', '¾', 'à²', '¶', 'à²', '¯', 'à²', 'Ĺ', 'à²', '³', 'à³', 'ģ']
Kannada Token Count: 31
IAST Tokens: ['h', 'Äģ', 'rd', 'ika', 'ĠÅĽ', 'ub', 'h', 'Äģ', 'ÅĽ', 'ay', 'ag', 'al', 'Ì', '¤', 'u']
IAST Token Count: 15
Romanized Kannada text uses fewer tokens (15) compared to the original Kannada text (31), resulting in faster processing.
Performance
- Test WER: 28.97%
- Test CER: 5.66%
- Test WER WITH NORMALIZATION: 23.12%
- Test CER WITH NORMALIZATION: 4.95%
Usage
#!pip install whisper_transcriber aksharamukha
from whisper_transcriber import WhisperTranscriber
from aksharamukha import transliterate
# Initialize the transcriber
transcriber = WhisperTranscriber(model_name="coild/whisper_small_kannada_translit_IAST")
# Transcribe an audio file with automatic transcript printing
results = transcriber.transcribe(
"audio_file.mp3",
min_segment=25,
max_segment=30,
silence_duration=0.2,
sample_rate=16000,
batch_size=4,
normalize=True,
normalize_text=True,
verbose=False
)
# Apply transliteration to all results
for segment in results:
print(f"\n[{segment['start']} --> {segment['end']}]")
print(transliterate.process('IAST', 'Kannada', segment['transcript']))
Model Details
Model Description
- Developed by: Ranjan Shettigar
- Language(s) (NLP): kn
- Finetuned from model [OpenAI]: whipser-small
- Repository: [More Information Needed]
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Training Details
Training and evaluation data
Training Data:
Evaluation Data:
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 16
- optimizer: adamw
- epochs: 4
Citation [optional]
BibTeX:
[More Information Needed]
- Downloads last month
- 205
Model tree for coild/whisper_small_kannada_translit_IAST
Base model
openai/whisper-smallEvaluation results
- WER on google/fleurstest set self-reported28.970
- CER on google/fleurstest set self-reported5.660
- WER WITH NORMALIZATION on google/fleurstest set self-reported23.120
- CER WITH NORMALIZATION on google/fleurstest set self-reported4.950