transciption problem

#3
by JL42 - opened

file long 19:50 min

Transcription of part of the file went well, but then it glued the text into 1 whole and you can't see the correct time codes

zoomit.png

NVIDIA org

Noted, thanks for feedback.

Temporarily you could segment audio in to 10 minutes each and transcribe, while we work on a fix for this.

NVIDIA org

This should be fixed now with space. Could you re-run and check. And also updated the space to support transcrption of audios upto 3hrs long

A little better but still problem:

zoomit.png
zoomit2.png

It doesn’t seem to occur on all long-duration segments, and in our samples, the issue appears resolved. Unlikely but is it possible for you share a sample to test.

I gave you a link to the sample , with which there are problems (glued text), but further you have not corrected anything. We are waiting.

NVIDIA org

This issue occurs only with a very few files. While we understand the cause, I recommend using the chunking method for audios longer than 10 minutes with this script: speech_to_text_buffered_infer_rnnt.py. This should resolve the attention problem. Use large chunk_len and buffer_length to minimize overlap. We also identified a merging issue; the fix is here: PR #13500.

Sign up or log in to comment