parakeet-tdt-0.6b-v2

Running on Zero

transciption problem

by JL42 - opened May 2

JL42

May 2

file long 19:50 min

Transcription of part of the file went well, but then it glued the text into 1 whole and you can't see the correct time codes

nithinraok

NVIDIA org May 2

Noted, thanks for feedback.

Temporarily you could segment audio in to 10 minutes each and transcribe, while we work on a fix for this.

nithinraok

NVIDIA org May 4

This should be fixed now with space. Could you re-run and check. And also updated the space to support transcrption of audios upto 3hrs long

JL42

May 4

A little better but still problem:

nithinraok

NVIDIA org May 5

•

edited May 5

It doesn’t seem to occur on all long-duration segments, and in our samples, the issue appears resolved. Unlikely but is it possible for you share a sample to test.

JL42

May 5

sample link: https://drive.google.com/file/d/1_gG_G7Auq5VxlpMilU14tfJLRhsFzI3v/view?usp=sharing

JL42

about 1 month ago

I gave you a link to the sample , with which there are problems (glued text), but further you have not corrected anything. We are waiting.

nithinraok

NVIDIA org about 1 month ago

This issue occurs only with a very few files. While we understand the cause, I recommend using the chunking method for audios longer than 10 minutes with this script: speech_to_text_buffered_infer_rnnt.py. This should resolve the attention problem. Use large chunk_len and buffer_length to minimize overlap. We also identified a merging issue; the fix is here: PR #13500.

JL42

24 days ago

•

edited 24 days ago

I extracted the audio from the YouTube video. https://www.youtube.com/watch?v=_x07BqvRT74
8:12 minutes
And there is still a problem with the glued text.
So it's not a problem of length over 10 minutes.

And the problem occurs more often than you think.

Please ADD SRT output.

JL42

22 days ago

I've been doing various tests and it's not the length that's the problem when pasting the resulting text.

This first file from Youtube with which there is a problem, even if I cut it into pieces of 2.5 minutes each, it still glues some piece of text together after processing.
I don't know why this is happening.

Thanks for adding SRT export.

KeepKool

21 days ago

•

edited 21 days ago

I do have the same issue, and it is not related to the length of the audio. It is very often. Does it also with my 9min segments (i cut into 9min segments because of sdram use that is very high (30GB for 9min segments). It tends to happens at the end i think. I cannot upload mp3 samples for privacy reasons.

nithinraok

NVIDIA org 20 days ago

Thanks for the feedback—appreciated! I’ll look into it and share an update here once I have a fix.

VAWA

17 days ago

Some transcripts are wrong. How to edit it before download it?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment