Timestamps
#1
by
valbarriere
- opened
Hola!
I have a naive question, does this model allow getting the word-level timestamps?
I obtain the following error:
>>> result = pipe(sample, return_timestamps="word")
WhisperModel is using WhisperSdpaAttention, but `torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True` or `layer_head_mask` not None. Falling back to the manual attention implementation, but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.
From v4.47 onwards, when a model cache is to be returned, `generate` will return a `Cache` instance instead by default (as opposed to the legacy tuple of tuples format). If you want to keep returning the legacy format, please set `return_legacy_cache=True`.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 283, in __call__
return super().__call__(inputs, **kwargs)
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1294, in __call__
return next(
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
item = next(self.iterator)
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 269, in __next__
processed = self.infer(next(self.iterator), **self.params)
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1209, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 515, in _forward
tokens = self.model.generate(
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 684, in generate
) = self.generate_with_fallback(
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 862, in generate_with_fallback
seek_sequences, seek_outputs = self._postprocess_outputs(
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 963, in _postprocess_outputs
seek_outputs["token_timestamps"] = self._extract_token_timestamps(
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 195, in _extract_token_timestamps
weights = torch.stack([cross_attentions[l][:, h] for l, h in alignment_heads])
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 195, in <listcomp>
weights = torch.stack([cross_attentions[l][:, h] for l, h in alignment_heads])
IndexError: list index out of range
Thanks in advance,
Valentin
Hola!
Probaste como dice este issue?
Yo no le di uso para timestamps, avisame si funciona esa solucion!
Si eso funcionaba, pero solo te da los timestamps al nivel del "chunk", no al nivel de las palabras