Can I use IPA with this model?

#22

by m-conrad-202 - opened 15 days ago

Discussion

m-conrad-202

15 days ago

Is there an option for IPA input for any of the supported languages?

yukiarimo

15 days ago

Bro, forget about IPA. These doomed devs create what’s called LLM-based TTS—a TTS that uses TTS’s tokenizer to represent, instead of turning text into phonetic representations (WHICH IS OF COURSE BAAAAD)!

You can look into VITS-2, which is my favorite model, and has never yet been beaten so far (I build one with 48 kHz with custom dataset, and listen audiobooks every day, so speaking from experience)!

Srijan-Capsitech

12 days ago

Bro, forget about IPA. These doomed devs create what’s called LLM-based TTS—a TTS that uses TTS’s tokenizer to represent, instead of turning text into phonetic representations (WHICH IS OF COURSE BAAAAD)!

You can look into VITS-2, which is my favorite model, and has never yet been beaten so far (I build one with 48 kHz with custom dataset, and listen audiobooks every day, so speaking from experience)!

Hey yukiarimo, will you be able to share some details regarding your dataset?
and if possible please do share links of the dataset/sources.
Thanks in advance

yukiarimo

11 days ago

•

edited 11 days ago

Dataset contains of the following languages, around:

English -> 60 hours
Japanese -> 24 hours
Russian -> 12 hours (read as transliteration)

And was recorded by me and my voice actress Fukai. As in total combining every speaker and languages, it will be around up to 200 hours of recordings.

Note: everything is phonized by eSpeak (and Cutlet for Japanese)

Insights: After +-40 hours the only improvement you get is just averaging the errors. And it’s better probably to read sentence-by-sentence than read the whole novel, and then do ASR.

Smaller languages are fine with 24 hours, tho if less it won’t be that good. But because there is so much data, it helps the model overall.

And no, as a policy of Yuki Story and our contract, I will not share the dataset or the model (architecture and training code is open in Airflow package, so you can train it, too).

If you want, you can find LJSpeech version and use it for free!

Code: https://github.com/yukiarimo/aiflow/tree/main/aiflow/models/hanasu

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment