Derur
/

vosk-models

Automatic Speech Recognition

speaker_indentification

speaker indentification

Model card Files Files and versions

vosk-models / tts /vosk-model-tts-ru-0.9-multi /README.md

Derur's picture

Upload 35 files

23d6cc4 verified 4 months ago

|

414 Bytes

Russian Vosk TTS model

Version 0.9

Metrics:

CER 0.6 FAD 0.810 UTMOS 3.290 Speaker Similarity 0.875 xRT CPU 0.35 xRT GPU 0.06

License: Apache 2.0

Changelog:

ASR alignment
No encoder, just duration predictor
Slightly thinner predictor width (160) to fit DiT hidden vector
Scale for diffusion loss (to not dominate on duration loss)