Shunya Labs Hinglish ASR Model

The only Hinglish code-switch STT model that generates transcripts in mixed tokens.

Model Details

Model Description

This is the first speech recognition model designed natively for Hinglish—the natural mix of Hindi and English commonly spoken across India. Unlike conventional approaches that force transcription into a single language, this model generates mixed-language tokens directly, preserving how people actually speak.

Base Model: OpenAI Whisper Medium Post-trained by: Shunya Labs Language: Hinglish (Hindi-English code-switching)

Why This Model?

Standard ASR models treat Hindi and English as separate languages, forcing transcription into one or the other. This creates errors when speakers naturally switch between languages mid-sentence—which is how millions of people actually talk. This model was trained specifically on code-switched speech, so it:

  • Transcribes Hindi and English tokens as they naturally occur
  • Handles mid-sentence language switches accurately
  • Produces faster inference by avoiding language detection overhead
  • Delivers higher accuracy on real-world Hinglish speech

Demo

Use Cases

  • Transcription of Hinglish conversations, podcasts, and videos
  • Voice assistants serving Indian users
  • Meeting transcription for Indian workplaces
  • Content creation and subtitling

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="shunya-labs/hinglish-whisper-medium")
result = transcriber("audio.mp3")
print(result["text"])

Training Details

Training Data

Openai/whisper-medium post-trained on Google Vaani as well as proprietary datasets.

Downloads last month
70
Safetensors
Model size
0.8B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shunyalabs/zero-stt-hinglish

Finetuned
(756)
this model

Datasets used to train shunyalabs/zero-stt-hinglish