DiscreteSpeech
/

DSTK

Model card Files Files and versions

DSTK / semantic_detokenizer /README.md

gooorillax's picture

fix detak readme

c061431 13 days ago

|

history blame contribute delete

777 Bytes

Speech Detokenizer

Our detokenizer is developed based on the F5-TTS framework and features two specific improvements.

The DiT module has been substituted by a DiT variant with cross-attention. It is similar to the detokenizer of GLM-4-Voice.
A chunk-based streaming inference algorithm is developed, it allows the model to generate speech of any length.

The detokenizer released this time was trained on approximately 6,000 hours of Chinese and English data. This dataset includes Wenet4TTS (both premium and standard), LibriTTS, and others.