Text-to-Speech
ONNX
Safetensors
English
Chinese
zhu-han commited on
Commit
094a5a0
·
verified ·
1 Parent(s): 4806fda

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -109
README.md CHANGED
@@ -9,120 +9,26 @@ tags:
9
  - text-to-speech
10
  ---
11
 
12
- # ZipVoice
13
 
14
- ## Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching</center>
15
 
16
- [![arXiv](https://img.shields.io/badge/arXiv-Paper-COLOR.svg)](http://arxiv.org/abs/2506.13053)
17
- [![demo](https://img.shields.io/badge/GitHub-Demo%20page-orange.svg)](https://zipvoice.github.io/)
 
 
 
 
 
 
 
18
 
19
- ## Overview
20
 
21
- ZipVoice is a high-quality zero-shot TTS model with a small model size and fast inference speed.
22
 
23
- ### 1. Key features
24
 
25
- - Small and fast: only 123M parameters.
26
-
27
- - High-quality: state-of-the-art voice cloning performance in speaker similarity, intelligibility, and naturalness.
28
-
29
- - Multi-lingual: support Chinese and English.
30
-
31
- ### 2. Architecture
32
-
33
- <div align="center">
34
-
35
- <img src="https://zipvoice.github.io/pics/zipvoice.png" width="500" >
36
-
37
- </div>
38
-
39
- ## News
40
-
41
- **2025/06/16**: 🔥 ZipVoice is released.
42
-
43
- ## Installation
44
-
45
- ### 1. Clone the ZipVoice repository
46
-
47
- ```bash
48
- git clone https://github.com/k2-fsa/ZipVoice.git
49
- ```
50
-
51
- ### 2. (Optional) Create a Python virtual environment
52
-
53
- ```bash
54
- python3 -m venv zipvoice
55
- source zipvoice/bin/activate
56
- ```
57
-
58
- ### 3. Install the required packages
59
-
60
- ```bash
61
- pip install -r requirements.txt
62
- ```
63
-
64
- ### 4. (Optional) Install k2 for training or efficient inference:
65
-
66
- k2 is necessary for training and can speed up inference. Nevertheless, you can still use the inference mode of ZipVoice without installing k2.
67
-
68
- > **Note:** Make sure to install the k2 version that matches your PyTorch and CUDA version. For example, if you are using pytorch 2.5.1 and CUDA 12.1, you can install k2 as follows:
69
-
70
- ```bash
71
- pip install k2==1.24.4.dev20250208+cuda12.1.torch2.5.1 -f https://k2-fsa.github.io/k2/cuda.html
72
- ```
73
-
74
- Please refer to https://k2-fsa.org/get-started/k2/ for details.
75
- Users in China mainland can refer to https://k2-fsa.org/zh-CN/get-started/k2/.
76
-
77
- ## Usage
78
-
79
- To generate speech with our pre-trained ZipVoice or ZipVoice-Distill models, use the following commands (Required models will be downloaded from HuggingFace):
80
-
81
- ### 1. Inference of a single sentence
82
-
83
- ```bash
84
- python3 zipvoice/zipvoice_infer.py \
85
- --model-name "zipvoice" \
86
- --prompt-wav prompt.wav \
87
- --prompt-text "I am the transcription of the prompt wav." \
88
- --text "I am the text to be synthesized." \
89
- --res-wav-path result.wav
90
- ```
91
-
92
- - `--model-name` can be `zipvoice` or `zipvoice_distill`, which are models before and after distillation, respectively.
93
- - If `<>` or `[]` appear in the text, strings enclosed by them will be treated as special tokens. `<>` denotes Chinese pinyin and `[]` denotes other special tags.
94
-
95
- ### 2. Inference of a list of sentences
96
-
97
- ```bash
98
- python3 zipvoice/zipvoice_infer.py \
99
- --model-name "zipvoice" \
100
- --test-list test.tsv \
101
- --res-dir results/test
102
- ```
103
-
104
- - Each line of `test.tsv` is in the format of `{wav_name}\t{prompt_transcription}\t{prompt_wav}\t{text}`.
105
-
106
- > **Note:** If you have trouble connecting to HuggingFace, try:
107
- > ```bash
108
- > export HF_ENDPOINT=https://hf-mirror.com
109
- > ```
110
-
111
- ### 3. Correcting mispronounced chinese polyphone characters
112
-
113
- We use [pypinyin](https://github.com/mozillazg/python-pinyin) to convert Chinese characters to pinyin. However, it can occasionally mispronounce **polyphone characters** (多音字).
114
-
115
- To manually correct these mispronunciations, enclose the **corrected pinyin** in angle brackets `< >` and include the **tone mark**.
116
-
117
- **Example:**
118
-
119
- - Original text: `这把剑长三十公分`
120
- - Correct the pinyin of `长`: `这把剑<chang2>三十公分`
121
-
122
- > **Note:** If you want to manually assign multiple pinyins, enclose each pinyin with `<>`, e.g., `这把<jian4><chang2><san1>十公分`
123
-
124
-
125
- ## Discussion & Communication
126
 
127
  You can directly discuss on [Github Issues](https://github.com/k2-fsa/ZipVoice/issues).
128
 
@@ -132,7 +38,7 @@ You can also scan the QR code to join our wechat group or follow our wechat offi
132
  | ------------ | ----------------------- |
133
  |![wechat](https://k2-fsa.org/zh-CN/assets/pic/wechat_group.jpg) |![wechat](https://k2-fsa.org/zh-CN/assets/pic/wechat_account.jpg) |
134
 
135
- ## Citation
136
 
137
  ```bibtex
138
  @article{zhu2025zipvoice,
 
9
  - text-to-speech
10
  ---
11
 
12
+ # ZipVoice⚡: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching</center>
13
 
14
+ ## 1. Explanation of each directory
15
 
16
+ | Directory | Model Type | Training Data | Initialized from |
17
+ | :---------------------------- | :-----------------------: | :-------------------------------: | :------------------------: |
18
+ | zipvoice | ZipVoice | Emilia | - |
19
+ | zipvoice_libritts | ZipVoice | LibriTTS | - |
20
+ | zipvoice_distill | ZipVoice-Distill | Emilia | zipvoice/model.pt |
21
+ | zipvoice_distill_libritts | ZipVoice-Distill | LibriTTS | zipvoice_libritts/model.pt |
22
+ | zipvoice_dialog | ZipVoice-Dialog | OpenDialog + in-house dataset | zipvoice/model.pt |
23
+ | zipvoice_dialog_opendialog | ZipVoice-Dialog | OpenDialog | zipvoice/model.pt |
24
+ | zipvoice_dialog_stereo | ZipVoice-Dialog-Stereo | in-house dataset | zipvoice_dialog/model.pt |
25
 
26
+ ## 2. Github
27
 
28
+ See our Github repository [ZipVoice](https://github.com/k2-fsa/ZipVoice) for details
29
 
 
30
 
31
+ ## 3. Discussion & Communication
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  You can directly discuss on [Github Issues](https://github.com/k2-fsa/ZipVoice/issues).
34
 
 
38
  | ------------ | ----------------------- |
39
  |![wechat](https://k2-fsa.org/zh-CN/assets/pic/wechat_group.jpg) |![wechat](https://k2-fsa.org/zh-CN/assets/pic/wechat_account.jpg) |
40
 
41
+ ## 4. Citation
42
 
43
  ```bibtex
44
  @article{zhu2025zipvoice,