Inference API error with Whisper, return_timestamps parameter
Bug description.
Hi team, been using the inference endpoint for whisper for months at https://api-inference.huggingface.co/models/openai/whisper-large-v3-turbo. Today, all of a sudden, the API started throwing this error
You have passed more than 3000 mel input features (> 30 seconds) which automatically enables long-form generation which requires the model to predict timestamp tokens. Please either pass
return_timestamps=True
or make sure to pass no more than 3000 mel input features.', 'warnings': ['There was an inference error: You have passed more than 3000 mel input features (> 30 seconds) which automatically enables long-form generation which requires the model to predict timestamp tokens. Please either passreturn_timestamps=True
or make sure to pass no more than 3000 mel input features.
This of course only happens when passing in samples longer than 30 seconds, and is replicable through the UI.
Passing a return_timestamp
parameter in the HTTP request does not solve the issue, either in boolean or string form (True/"true")
parameters = { "language": "en", "temperature": "0.0", "return_timestamps": True}
Using generation_params
also fails here.
Describe the expected behaviour
The endpoint should run inference as it has previously. Samples exceeding 30 seconds were supported without any issues and no parameter had to be provided.
I'm having this problem too. Before 2025/04/11 or 4/12, the wav file longer than 2minutes work well in whisper large v3-turbo. Then suddenly this error appeared and my whole application stopped. Please somebody tell me the reason why this error occurs..