mesolitica
/

whisper-conv-large-v3-turbo

Automatic Speech Recognition

Model card Files Files and versions

whisper-conv-large-v3-turbo

Add a convolution layer with stride 2 to introduce 25 TPS. This model use to introduce VQ for projection layer later.

WanDB at https://wandb.ai/huseinzol05/whisperconv?nw=nwuserhuseinzol05

Training dataset

Evaluation

Evaluate on malaysia-ai/common_voice_17_0/test, with some conditions,

Lower case.
Remove punctuation.
Provide language tagging for decoder input ids, <|startoftranscript|><|{lang}|><|transcribe|><|notimestamps|>.

Source code

Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/whisper-conv

Downloads last month: 46

Safetensors

Model size

809M params

Tensor type

F32

·

Inference Providers NEW

Automatic Speech Recognition

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including mesolitica/whisper-conv-large-v3-turbo

Speech Tokenizer

Multilingual discrete speech tokenizer for LLM. • 6 items • Updated 2 days ago