Feedback Italian voice
italian male voice is very very good!! Amazing! The female not so good (expecially on numbers) but accettable.
I work for a publisher that also publish podcast, if needed we can collaborate for training of italian language using audio file.
bye to all!
Seriously? to me it seems quite emotionless and linear while speaking, with too grave a tone, although it is possible to change it with post processing. How did you get your results?
Obviously if you compare it to the best Elevenlabs voices it is even inferior, but imho the male voice is already much better than many other models I have tried. It's a very solid starting base, with this base I believe that in 6 months -if there are contributions- we can think of having a perfect voice. (at least I hope so). If you knoiw better model opensource for italian...let me know
Personally, I dealt with ConquiTTS and used a cloned voice that made it very expressive and pleasant. Of course if we are talking about performance Kokoro is able to do it with very few resources compared to Conqui. Also GiuseppeMultilingual from Microsoft is also not bad at all, certainly better than this.
but yes, if we look at the fact that this model runs locally, is super fast and has only 82M parameters compared to piper or other small models of this size, this model is clearly superior
As noted in
Support for non-English languages may be absent or thin due to weak G2P and/or lack of training data. Some languages are only represented by a small handful or even just one voice (French).
For Italian in particular, there are two potential issues at training:
- The model has only seen a handful (single digit #) of hours of Italian in its training lifetime. This probably is not enough for truly good performance.
- It is unclear how accurate
is at Italian G2P.
This actually goes for any language, any voice as it relates to Kokoro models. It is likely the inference quality will be somewhat proportional to both the training quality and quantity.
Closing this with one more thought.
Also GiuseppeMultilingual from Microsoft is also not bad at all, certainly better than this.
I'm assuming EdgeTTS? This is good to know, since it is very easy for people to farm training data from that endpoint using edge-tts, but as mentioned in #92 under "Better Training Data", edge-tts
probably does not constitute "A-tier or S-tier training data" since it is probably traditional TTS, not MLLM with reinforcement learning.