whisper-lang-id

This model is a fine-tuned version of openai/whisper-tiny on mozilla-foundation/common_voice_11_0 dataset

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

Mozilla foundation/common_voice_11.0

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Accuracy F1
No log 1.0 175 0.0148 0.995 0.9950

Framework versions

  • Transformers 4.48.0.dev0
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.

Example Usage

Here is an example of how to use the model for Language Idenfication with Gradio:

import torch
from transformers import pipeline
import gradio as gr

# Use a pipeline as a high-level helper
pipe = pipeline("audio-classification", model="Lingalingeswaran/whisper-lang-id")

def identify_language(audio_file):
    """Identifies the language of an audio file."""
    try:
        result = pipe(audio_file)
        predicted_label = result[0]['label']
        score = result[0]['score']

        if predicted_label == "LABEL_0":
            predicted_label = "Tamil"
        elif predicted_label == "LABEL_1":
            predicted_label = "English"
        else:
            predicted_label = predicted_label

        return f"Predicted Language: {predicted_label}, Score: {score:.4f}"
    except Exception as e:
        return f"Error during language identification: {e}"

# Gradio interface
def create_gradio_interface():
    with gr.Blocks() as demo:
        gr.Markdown("### Language Identification from Audio File")
        gr.Markdown("Upload an audio file or use your microphone to detect the language spoken.")

        # Corrected the sources argument
        audio_input = gr.Audio(sources=["microphone", "upload"], type="filepath", label="Record or Upload Audio")
        result_output = gr.Textbox(label="Language Identification Result", interactive=False)

        # Submit button
        submit_btn = gr.Button("Submit")
        submit_btn.click(identify_language, inputs=audio_input, outputs=result_output)

        # Clear button
        clear_btn = gr.Button("Clear")
        clear_btn.click(lambda: (None, None), outputs=[audio_input, result_output])  # Clear audio and result

    demo.launch()

# Run the Gradio interface
create_gradio_interface()
Downloads last month
42
Safetensors
Model size
8.31M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Lingalingeswaran/whisper-lang-id

Finetuned
(1561)
this model

Dataset used to train Lingalingeswaran/whisper-lang-id