This model whas trained with two A100 40 GB, 128 GB RAM and 2 x Xeon 48 Core 2.4 GHz

  • Time spent ~ 7 hours
  • Count of train dataset - 118k of audio samples from Mozilla Common Voice 17

Example of usage

from transformers import pipeline
import gradio as gr
import time

pipe = pipeline(
    model="dvislobokov/whisper-large-v3-turbo-russian",
    tokenizer="dvislobokov/whisper-large-v3-turbo-russian",
    task='automatic-speech-recognition',
    device='cpu'
)

def transcribe(audio):
    start = time.time()
    text = pipe(audio, return_timestamps=True)['text']
    print(time.time() - start)
    return text

iface = gr.Interface(
    fn=transcribe,
    inputs=gr.Audio(sources=['microphone', 'upload'], type='filepath'),
    outputs='text'
)

iface.launch(share=True)
Downloads last month
1,287
Safetensors
Model size
809M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for dvislobokov/whisper-large-v3-turbo-russian

Finetuned
(180)
this model
Finetunes
1 model

Dataset used to train dvislobokov/whisper-large-v3-turbo-russian

Space using dvislobokov/whisper-large-v3-turbo-russian 1