DiscreteSpeech
/

DSTK

Model card Files Files and versions

DSTK / semantic_detokenizer /README.md

gooorillax's picture

fix detak readme

c061431 16 days ago

|

history blame contribute delete

777 Bytes

	## Speech Detokenizer
	#### Our detokenizer is developed based on the [F5-TTS](https://github.com/SWivid/F5-TTS) framework and features two specific improvements.

	1. The DiT module has been substituted by a DiT variant with cross-attention. It is similar to the detokenizer of [GLM-4-Voice](https://github.com/zai-org/GLM-4-Voice).
	<p align="center"><img src="../figs/CADiT.jpg" width="500"></p>

	2. A chunk-based streaming inference algorithm is developed, it allows the model to generate speech of any length.
	<p align="center"><img src="../figs/F5-streaming.jpg" width="1200"></p>

	#### The detokenizer released this time was trained on approximately 6,000 hours of Chinese and English data. This dataset includes Wenet4TTS (both premium and standard), LibriTTS, and others.