metadata
language_bcp47:
- kn-IN
model-index:
- name: Whisper Small Kannada
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: google/fleurs
type: google/fleurs
config: kn_in
split: test
metrics:
- type: wer
value: 29.63
name: WER
- type: cer
value: 7.12
name: CER
- type: wer
value: 23.61
name: WER WITH NORMALIZATION
- type: cer
value: 6.21
name: CER WITH NORMALIZATION
pipeline_tag: automatic-speech-recognition
language:
- kn
metrics:
- wer
base_model:
- openai/whisper-small
library_name: transformers
Whisper Small fine-tuned for Kannada
This is a Whisper Small model fine-tuned for Kannada Language on ~300 hrs of labeled dataset.
Performance
- Test WER: 29.63%
- Test CER: 7.12%
- Test WER WITH NORMALIZATION: 23.61%
- Test CER WITH NORMALIZATION: 6.21%
Usage
#!pip install whisper_transcriber
from whisper_transcriber import WhisperTranscriber
# Initialize the transcriber
transcriber = WhisperTranscriber(model_name="coild/whisper_small_kannada")
# Transcribe an audio file with automatic transcript printing
results = transcriber.transcribe(
"audio_file.mp3",
min_segment=5,
max_segment=15,
silence_duration=0.2,
sample_rate=16000,
batch_size=4,
normalize=True,
normalize_text=True,
verbose=False
)
# Access the transcription results manually
for i, segment in enumerate(results):
print(f"\n[{segment['start']} --> {segment['end']}]")
print(f"{segment['transcript']}")
Model Details
Model Description
- Developed by: Ranjan Shettigar
- Language(s) (NLP): kn
- Finetuned from model [OpenAI]: whipser-small
- Repository: [More Information Needed]
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Training Details
Training and evaluation data
Training Data:
Evaluation Data:
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 16
- optimizer: adamw
- epochs: 8
BibTeX:
[More Information Needed]