Instructions to use openai/whisper-tiny with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai/whisper-tiny with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="openai/whisper-tiny")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("openai/whisper-tiny") model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-tiny") - Notebooks
- Google Colab
- Kaggle
Only English transcriptions on Dutch transcribe task?
When performing the transcribe task on the Dutch Common Voice Data (locally downloaded), I seem to only obtain English transcriptions for the tiny, small, and base models which are the ones I have tested so far. Therefore, I assume there is a mistake in the code or the way I use the pipeline, could anyone help me? I posted the code below.pipe_whisper = pipeline(model="openai/whisper-tiny", device=device, tokenizer=WhisperTokenizer.from_pretrained("openai/whisper-tiny", language="Dutch", task="transcribe"))df["transcription_whisper"] = df["path"].progress_apply(lambda path: pipe_whisper(DATA_COMMON_VOICE_PATH/path))
Hey! This means either once of three:
- the model translates
- the model is bad at transcribing dutch.
- the task is not fed properly
You should try forwarding the task to whisper using pipe = pipeline(.....,generate_kwargs={"task": "transcribe", "language": "Dutch"}