Information
#7
by
Sarvjeet001
- opened
Hi, thanks for you model... i am using it for exploring TTS but i have some queries that need your guidance:
- What is the reason that your model can generate max 30 sec audio ? Is this depends on the audios length of training data? And if yes, can we increase a TTS model audio output time 30 sec to 3 min by changing training dataset audios from seconds to minutes ?
- Is there any issue in audio output if we generate it longer then 30 sec by using any possible way?
- Why your model has max word limit 20 for better result? On what it depends? and how can we increase it ?
Hi,
some experiments to increase word limit, I used chunking + batch generation to increase word limit.
Usage in a server
https://github.com/slabstech/parler-tts-server/blob/c83a89d7468610744eecab9f38fbaef691641efc/parler_tts_server/main.py#L191
Function call
https://github.com/slabstech/llm-recipes/blob/ed08fa0301ceb8b936edf49f271975734f869438/python/notebooklm/tts_generator.py#L183