Fine tuning Process

#2
by Aweheid - opened

Hello @Thomcles ,

I'm really happy for you with a successful finetuning Chatterbox TTS for French language! I heard the voice, and for me it sounds naturally even if I don't speak or understand French : )

I'd like to do the same with Arabic, but I thought that it's good to know about other guys who already did it very well and to get as much as we can to improve the finetuning way and prepare more data for our languages to enrich the community.

Cannot wait to get a notification and comment from you!

Thanks again!

Owner

Glad people are interested in my project!

I'm currently working on a model in Latin languages (fr, es, it, de, en).
And I'm in the process of researching Arabic, Chinese, Japanese, Hindu, Russian, Turk, Dutch royalty-free data in order to train a multilingual model like xttsv2.
It's going to cost me a lot of money, so you'll probably be able to take advantage of it if the model I'm going to create is really interesting after finetuning.

Can you be more specific about what you'd like to know? How far along are you? To find out if I'm talking to someone who's never used a cloud provider before, or if you're already familiar with it.

Salut @Thomscles.Je voulais juste te remercier. Ce fine tuning est super cool. En un micro essai sur mon Mac mini M1 avec ma voix ca marche vraiment pas mal.
A voir comment on pourrait gérer des création de voix spécifique sans refaire tout le process à chaque fois.

Au plaisir :)

Owner

Content que ça te plaise @MrCasquette !
Je pense qu'il est possible que je trouve une solution à ton problème, mais pour cela il vaudrait mieux que tu commences une nouvelle discussion pour préciser ce que tu voudrais faire (en anglais pour en faire profiter la communauté open-source ! ). Et si tu as trouvé ça utile n'hesite pas à mettre un coeur sur le projet pour que plus de gens voient ce projet !

Thomcles, I would like to help you with Russian dataset.
"And I'm in the process of researching Arabic, Chinese, Japanese, Hindu, Russian, Turk, Dutch royalty-free data in order to train a multilingual model like xttsv2."
4000+ hours high quality.
https://huggingface.co/ESpeech
Can you write instructions on how to prepare the training data? And how to train: the number of epochs, and everything else? How long will it take to learn 4,000 hours of Russian speech on rtx 4090? I could probably help you with the Russian language by teaching it, if you would explain to me how to do it? Or you can use the speeches from my link, which was recently made publicly available. Its quality will please you.

Thomcles, I would like to help you with Russian dataset.
"And I'm in the process of researching Arabic, Chinese, Japanese, Hindu, Russian, Turk, Dutch royalty-free data in order to train a multilingual model like xttsv2."
4000+ hours high quality.
https://huggingface.co/ESpeech
Can you write instructions on how to prepare the training data? And how to train: the number of epochs, and everything else? How long will it take to learn 4,000 hours of Russian speech on rtx 4090? I could probably help you with the Russian language by teaching it, if you would explain to me how to do it? Or you can use the speeches from my link, which was recently made publicly available. Its quality will please you.

This is no longer necessary, as Resemble AI has released a multilingual model supporting 23 languages (including Russian): https://huggingface.co/ResembleAI/chatterbox

Thomcles, I would like to help you with Russian dataset.
"And I'm in the process of researching Arabic, Chinese, Japanese, Hindu, Russian, Turk, Dutch royalty-free data in order to train a multilingual model like xttsv2."
4000+ hours high quality.
https://huggingface.co/ESpeech
Can you write instructions on how to prepare the training data? And how to train: the number of epochs, and everything else? How long will it take to learn 4,000 hours of Russian speech on rtx 4090? I could probably help you with the Russian language by teaching it, if you would explain to me how to do it? Or you can use the speeches from my link, which was recently made publicly available. Its quality will please you.

This is no longer necessary, as Resemble AI has released a multilingual model supporting 23 languages (including Russian): https://huggingface.co/ResembleAI/chatterbox

zero-cloning of Russian speech doesn't sound good in it. To create audio books, I would like to train my favorite voices by adding speech with them to finetune. Can you explain how to prepare the data and train it? In addition, I want to add high-quality speech to the dataset. I also wonder if it is possible to train a separate voice on kaggle on 2 video cards? How many hours will it take to learn 20 hours of speech?

The time it would take does not depend on the amount of data: batch size, number of epochs, number of workers, gpu TFLOPS...
I won't be able to give you a correct prediction.

Data processing depends on how the training works.

I can do it for you if you want.

Contact me at [email protected] if you want to discuss it.

Sign up or log in to comment