whisper-conv-large-v3-turbo

Add a convolution layer with stride 2 to introduce 25 TPS. This model use to introduce VQ for projection layer later.

WanDB at https://wandb.ai/huseinzol05/whisperconv?nw=nwuserhuseinzol05

Training dataset

  1. malaysia-ai/common_voice_17_0
  2. mesolitica/Malaysian-STT-Whisper-Stage2/malaysian_multiturn_chat_assistants_segments
  3. mesolitica/Malaysian-STT-Whisper-Stage2/malaysian_multiturn_chat_assistants_manglish_segments

Evaluation

Evaluate on malaysia-ai/common_voice_17_0/test, with some conditions,

  1. Lower case.
  2. Remove punctuation.
  3. Provide language tagging for decoder input ids, <|startoftranscript|><|{lang}|><|transcribe|><|notimestamps|>.

Source code

Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/whisper-conv

Downloads last month
46
Safetensors
Model size
809M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including mesolitica/whisper-conv-large-v3-turbo