Inference Endpoint - Multilingual Audio Transcription with Whisper models

Deploy OpenAI's Whisper Inference Endpoint to transcribe audio files to text in many languages

Resulting deployment exposes an OpenAI Platform Transcription compatible HTTP endpoint which you can query using the OpenAi Libraries or directly through cURL for instance.

Available Routes

path description
/api/v1/audio/transcriptions Transcription endpoint to interact with the model
/docs Visual documentation

Getting started

  • Getting text output from audio file
curl http://localhost:8000/api/v1/audio/transcriptions \
  --request POST \
  --header 'Content-Type: multipart/form-data' \
  -F file=@</path/to/audio/file> \
  -F "response_format": "text"
  • Getting JSON output from audio file
curl http://localhost:8000/api/v1/audio/transcriptions \
  --request POST \
  --header 'Content-Type: multipart/form-data' \
  -F file=@</path/to/audio/file> \
  -F "response_format": "json"
  • Getting segmented JSON output from audio file
curl http://localhost:8000/api/v1/audio/transcriptions \
  --request POST \
  --header 'Content-Type: multipart/form-data' \
  -F file=@</path/to/audio/file> \
  -F "response_format": "verbose_json"

Specifications

spec value description
Engine vLLM (v0.8.3) Underlying inference engine leverages vLLM
Hardware GPU (Ada Lovelace) Requires the target endpoint to run over NVIDIA GPUs with at least compute capabilities 8.9 (Ada Lovelace)
Compute data type bfloat16 Computations (matmuls, norms, etc.) are done using bfloat16 precision
KV cache data type float8 (e4m3) Key-Value cache is stored on the GPU using float8 (float8_e4m3) precision to save space
PyTorch Compile βœ… Enable the use of torch.compile to further optimize model's execution with more optimizations
CUDA Graphs βœ… Enable the use of so called "CUDA Graphs" to reduce overhead executing GPU computations
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for hfendpoints-images/whisper-vllm-gpu

Finetuned
(464)
this model

Space using hfendpoints-images/whisper-vllm-gpu 1

Collections including hfendpoints-images/whisper-vllm-gpu