How many hours of speech data was this model trained on?

#3
by stefanr123 - opened

Just out of interest what do the 60,000 utterances amount to and was that the sole training data, or was the base model already trained on other speech? Just trying to understand what amount of data it would take to train a model of this quality in another language.

The 60k utterances are merely an adaptor fix. We have much more data that base was trained on, and we had to remove some speaker embeddings and focus on these speakers. So hence the 60k voice samples are re-trained for the Open Source adaptor. The base has more knowledge.

Thank you! Can you share a rough ballpark for data used for the base model?

Sign up or log in to comment