Whisper-small OpenVINO IR

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning.

Whisper was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al from OpenAI. The original code repository can be found here.

Disclaimer: Content for this model card has partly been copied and pasted from this model card.

Model details

Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model.

Model Type	Parameters	n_audio_ctx	n_audio_state	n_audio_head	n_audio_layer	n_text_ctx	n_text_state	n_text_head	n_text_layer	n_mels	n_vocab
whisper-tiny	39 M	1500	384	6	4	224	384	6	4	80	51865
whisper-base	74 M	1500	512	8	6	224	512	8	6	80	51865
whisper-small	244 M	1500	768	12	12	224	768	12	12	80	51865
whisper-medium	769 M	1500	1024	16	24	224	1024	16	16	80	51865
whisper-large-v1	1550 M	1500	1280	20	32	224	1280	20	20	80	51865
whisper-large-v2	1550 M	1500	1280	20	32	224	1280	20	20	80	51865
distil-whisper-large-v2	756 M	1500	1280	20	32	224	1280	20	2	80	51865
whisper-large-v3	1550 M	1500	1280	20	32	224	1280	20	20	128	51866
distil-whisper-large-v3	756 M	1500	1280	20	32	224	1280	20	2	128	51866
whisper-large-v3-turbo	809 M	1500	1280	20	32	224	1280	20	4	128	51866

Intel
/

whisper-small-openvino

Whisper-small OpenVINO IR

Model details

Model tree for Intel/whisper-small-openvino