Speed. Seed?

#30
by CireRetsal - opened

Would be nice if the audio was time stretched when the speed is decreased to keep the pitch. Also I didn't see anywhere I can set the seed for speaker consitency. Very random. I like it though. Thanks. Looking forward to updates.

Did you figure out the seed?

I figured out the speed of speech, you make the cfg_scale in generate lower. It's currently defaulted to 3.0, so it talks really fast. Can't figure out seed though.

UPDATE: Nevermind I don't think that's what it does

For speaker consistency, I have been able to set the custom fixed seed in the model.py file (inside the dia subfolder) under the generate function. However, this only ensures the same speaker if the input text is identical. For fully reproducible results across different texts, voice cloning is the recommended approach. You can check out my full implementation here:
https://github.com/devnen/Dia-TTS-Server

For adjusting speech speed, cfg_scale doesn’t directly control it. Instead, use the dedicated speed parameter in the API/UI. This applies postprocessing to resample the generated audio while maintaining quality.

For speaker consistency, I have been able to set the custom fixed seed in the model.py file (inside the dia subfolder) under the generate function. However, this only ensures the same speaker if the input text is identical. For fully reproducible results across different texts, voice cloning is the recommended approach. You can check out my full implementation here:
https://github.com/devnen/Dia-TTS-Server

For adjusting speech speed, cfg_scale doesn’t directly control it. Instead, use the dedicated speed parameter in the API/UI. This applies postprocessing to resample the generated audio while maintaining quality.

Oh dang!!!! I just peeked at it. I'm gonna try this tonight. Thank you.

CireRetsal changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment