Shunya Labs Hinglish ASR Model

The only Hinglish code-switch STT model that generates transcripts in mixed tokens.

Model Details

Model Description

This is the first speech recognition model designed natively for Hinglish—the natural mix of Hindi and English commonly spoken across India. Unlike conventional approaches that force transcription into a single language, this model generates mixed-language tokens directly, preserving how people actually speak.

Base Model: OpenAI Whisper Medium Post-trained by: Shunya Labs Language: Hinglish (Hindi-English code-switching)

Why This Model?

Standard ASR models treat Hindi and English as separate languages, forcing transcription into one or the other. This creates errors when speakers naturally switch between languages mid-sentence—which is how millions of people actually talk. This model was trained specifically on code-switched speech, so it:

Transcribes Hindi and English tokens as they naturally occur
Handles mid-sentence language switches accurately
Produces faster inference by avoiding language detection overhead
Delivers higher accuracy on real-world Hinglish speech

Demo

Try the model at: https://huggingface.co/spaces/shunyalabs/Zero_STT_Hinglish_Shunya_Labs

Use Cases

Transcription of Hinglish conversations, podcasts, and videos
Voice assistants serving Indian users
Meeting transcription for Indian workplaces
Content creation and subtitling

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="shunya-labs/hinglish-whisper-medium")
result = transcriber("audio.mp3")
print(result["text"])

Training Details

Training Data

Openai/whisper-medium post-trained on Google Vaani as well as proprietary datasets.

Downloads last month: 70

Safetensors

Model size

0.8B params

Tensor type

F16

Model tree for shunyalabs/zero-stt-hinglish

Base model

openai/whisper-medium

Finetuned

(756)

this model

shunyalabs
/

zero-stt-hinglish