--- library_name: transformers language: - si license: apache-2.0 base_model: openai/whisper-small tags: - generated_from_trainer datasets: - Lingalingeswaran/asr-sinhala-dataset_json_v1 metrics: - wer model-index: - name: Whisper Small sinhala v3 - Lingalingeswaran results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Lingalingeswaran/asr-sinhala-dataset_json_v1 type: Lingalingeswaran/asr-sinhala-dataset_json_v1 args: 'config: si, split: test' metrics: - name: Wer type: wer value: 46.457654723127035 --- # Whisper Small sinhala v3 - Lingalingeswaran This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Lingalingeswaran/asr-sinhala-dataset_json_v1 dataset. It achieves the following results on the evaluation set: - Loss: 0.2086 - Wer: 46.4577 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 500 - training_steps: 3000 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:------:|:----:|:---------------:|:-------:| | 0.1852 | 1.7606 | 1000 | 0.1875 | 50.9772 | | 0.0602 | 3.5211 | 2000 | 0.1886 | 47.5774 | | 0.0238 | 5.2817 | 3000 | 0.2086 | 46.4577 | ### Framework versions - Transformers 4.48.1 - Pytorch 2.5.1+cu121 - Datasets 3.2.0 - Tokenizers 0.21.0 ## Example Usage Here is an example of how to use the model for Sinhala speech recognition with Gradio: ```python import gradio as gr from transformers import pipeline # Initialize the pipeline with the specified model pipe = pipeline(model="Lingalingeswaran/whisper-small-sinhala_v3") def transcribe(audio): # Transcribe the audio file to text text = pipe(audio)["text"] return text # Create the Gradio interface iface = gr.Interface( fn=transcribe, inputs=gr.Audio(sources=["microphone", "upload"], type="filepath"), outputs="text", title="Whisper Small Sinhala", description="Realtime demo for Sinhala speech recognition using a fine-tuned Whisper small model.", ) # Launch the interface if __name__ == "__main__": iface.launch()