## Speech Detokenizer #### Our detokenizer is developed based on the [F5-TTS](https://github.com/SWivid/F5-TTS) framework and features two specific improvements. 1. The DiT module has been substituted by a DiT variant with cross-attention. It is similar to the detokenizer of [GLM-4-Voice](https://github.com/zai-org/GLM-4-Voice).