Can I use IPA with this model?
Is there an option for IPA input for any of the supported languages?
Bro, forget about IPA. These doomed devs create what’s called LLM-based TTS—a TTS that uses TTS’s tokenizer to represent, instead of turning text into phonetic representations (WHICH IS OF COURSE BAAAAD)!
You can look into VITS-2, which is my favorite model, and has never yet been beaten so far (I build one with 48 kHz with custom dataset, and listen audiobooks every day, so speaking from experience)!
Bro, forget about IPA. These doomed devs create what’s called LLM-based TTS—a TTS that uses TTS’s tokenizer to represent, instead of turning text into phonetic representations (WHICH IS OF COURSE BAAAAD)!
You can look into VITS-2, which is my favorite model, and has never yet been beaten so far (I build one with 48 kHz with custom dataset, and listen audiobooks every day, so speaking from experience)!
Hey yukiarimo, will you be able to share some details regarding your dataset?
and if possible please do share links of the dataset/sources.
Thanks in advance
Dataset contains of the following languages, around:
- English -> 60 hours
- Japanese -> 24 hours
- Russian -> 12 hours (read as transliteration)
And was recorded by me and my voice actress Fukai. As in total combining every speaker and languages, it will be around up to 200 hours of recordings.
Note: everything is phonized by eSpeak (and Cutlet for Japanese)
Insights: After +-40 hours the only improvement you get is just averaging the errors. And it’s better probably to read sentence-by-sentence than read the whole novel, and then do ASR.
Smaller languages are fine with 24 hours, tho if less it won’t be that good. But because there is so much data, it helps the model overall.
And no, as a policy of Yuki Story and our contract, I will not share the dataset or the model (architecture and training code is open in Airflow package, so you can train it, too).
If you want, you can find LJSpeech version and use it for free!
Code: https://github.com/yukiarimo/aiflow/tree/main/aiflow/models/hanasu