Derur's picture
Upload 35 files
23d6cc4 verified
|
raw
history blame
414 Bytes

Russian Vosk TTS model

Version 0.9

Metrics:

CER 0.6 FAD 0.810 UTMOS 3.290 Speaker Similarity 0.875 xRT CPU 0.35 xRT GPU 0.06

License: Apache 2.0

Changelog:

  • ASR alignment
  • No encoder, just duration predictor
  • Slightly thinner predictor width (160) to fit DiT hidden vector
  • Scale for diffusion loss (to not dominate on duration loss)