A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models
Abstract
Balalaika, a large Russian speech dataset with detailed annotations, improves performance in speech synthesis and enhancement tasks.
Russian speech synthesis presents distinctive challenges, including vowel reduction, consonant devoicing, variable stress patterns, homograph ambiguity, and unnatural intonation. This paper introduces Balalaika, a novel dataset comprising more than 2,000 hours of studio-quality Russian speech with comprehensive textual annotations, including punctuation and stress markings. Experimental results show that models trained on Balalaika significantly outperform those trained on existing datasets in both speech synthesis and enhancement tasks. We detail the dataset construction pipeline, annotation methodology, and results of comparative evaluations.
Community
Official repo: https://github.com/mtuciru/balalaika
Official HF collection: https://huggingface.co/collections/MTUCI/balalaika-68630b399254bf151885427e
Kirill's telegram channel: https://t.me/korallll_ai
arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/a-data-centric-framework-for-addressing-phonetic-and-prosodic-challenges-in-russian-speech-generative-models
thanks
Models citing this paper 0
No model linking this paper
Datasets citing this paper 5
Browse 5 datasets citing this paperSpaces citing this paper 0
No Space linking this paper