The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...
•
52
around 10gb, and around 300 chars is the sweet spot. you can chunk text and do it though
I had a look at both, it seems doable. Ill try follow the repeng example. But its a bit confusing how they generate the dataset