Commit
·
c061431
1
Parent(s):
bdecca1
fix detak readme
Browse files
semantic_detokenizer/README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
#### Our detokenizer is developed based on the [F5-TTS](https://github.com/SWivid/F5-TTS) framework and features two specific improvements.
|
3 |
|
4 |
1. The DiT module has been substituted by a DiT variant with cross-attention. It is similar to the detokenizer of [GLM-4-Voice](https://github.com/zai-org/GLM-4-Voice).
|
5 |
-
<p align="center"><img src="../figs/CADiT.jpg"
|
6 |
|
7 |
2. A chunk-based streaming inference algorithm is developed, it allows the model to generate speech of any length.
|
8 |
<p align="center"><img src="../figs/F5-streaming.jpg" width="1200"></p>
|
|
|
2 |
#### Our detokenizer is developed based on the [F5-TTS](https://github.com/SWivid/F5-TTS) framework and features two specific improvements.
|
3 |
|
4 |
1. The DiT module has been substituted by a DiT variant with cross-attention. It is similar to the detokenizer of [GLM-4-Voice](https://github.com/zai-org/GLM-4-Voice).
|
5 |
+
<p align="center"><img src="../figs/CADiT.jpg" width="500"></p>
|
6 |
|
7 |
2. A chunk-based streaming inference algorithm is developed, it allows the model to generate speech of any length.
|
8 |
<p align="center"><img src="../figs/F5-streaming.jpg" width="1200"></p>
|