Timestamps

#1
by valbarriere - opened

Hola!

I have a naive question, does this model allow getting the word-level timestamps?

I obtain the following error:

>>> result = pipe(sample, return_timestamps="word")
WhisperModel is using WhisperSdpaAttention, but `torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True` or `layer_head_mask` not None. Falling back to the manual attention implementation, but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.
From v4.47 onwards, when a model cache is to be returned, `generate` will return a `Cache` instance instead by default (as opposed to the legacy tuple of tuples format). If you want to keep returning the legacy format, please set `return_legacy_cache=True`.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 283, in __call__
    return super().__call__(inputs, **kwargs)
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1294, in __call__
    return next(
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 269, in __next__
    processed = self.infer(next(self.iterator), **self.params)
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1209, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 515, in _forward
    tokens = self.model.generate(
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 684, in generate
    ) = self.generate_with_fallback(
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 862, in generate_with_fallback
    seek_sequences, seek_outputs = self._postprocess_outputs(
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 963, in _postprocess_outputs
    seek_outputs["token_timestamps"] = self._extract_token_timestamps(
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 195, in _extract_token_timestamps
    weights = torch.stack([cross_attentions[l][:, h] for l, h in alignment_heads])
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 195, in <listcomp>
    weights = torch.stack([cross_attentions[l][:, h] for l, h in alignment_heads])
IndexError: list index out of range

Thanks in advance,
Valentin

Universidad Nacional de Rio Negro org

Hola!
Probaste como dice este issue?
Yo no le di uso para timestamps, avisame si funciona esa solucion!

Si eso funcionaba, pero solo te da los timestamps al nivel del "chunk", no al nivel de las palabras

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment