Update README.md
Browse files
README.md
CHANGED
@@ -32,9 +32,10 @@ datasets:
|
|
32 |
|
33 |
## Model description
|
34 |
|
35 |
-
**Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
|
36 |
-
The
|
37 |
-
|
|
|
38 |
|
39 |
**Matcha-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
|
40 |
This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
|
@@ -52,10 +53,20 @@ This may be due to the sensitivity of the model in learning specific frequencies
|
|
52 |
|
53 |
### Installation
|
54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
```bash
|
56 |
-
|
57 |
```
|
58 |
-
|
|
|
59 |
|
60 |
```bash
|
61 |
git clone https://github.com/projecte-aina/espeak-ng.git
|
@@ -72,6 +83,13 @@ pip install mecab-python3
|
|
72 |
pip install unidic-lite
|
73 |
|
74 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
|
76 |
### Generate
|
77 |
|
|
|
32 |
|
33 |
## Model description
|
34 |
|
35 |
+
**Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
|
36 |
+
The encoder part is based on a text encoder and a phoneme duration prediction that together predict averaged acoustic features.
|
37 |
+
And the decoder has essentially a U-Net backbone inspired by [Grad-TTS](https://arxiv.org/pdf/2105.06337.pdf), which is based on the Transformer architecture.
|
38 |
+
In the latter, by replacing 2D CNNs by 1D CNNs, a large reduction in memory consumption and fast synthesis is achieved.
|
39 |
|
40 |
**Matcha-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
|
41 |
This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
|
|
|
53 |
|
54 |
### Installation
|
55 |
|
56 |
+
This model has been trained using the espeak-ng open source text-to-speech software.
|
57 |
+
The espeak-ng containing the Catalan phonemizer can be found [here](https://github.com/projecte-aina/espeak-ng)
|
58 |
+
|
59 |
+
Create a virtual environment:
|
60 |
+
|
61 |
+
```bash
|
62 |
+
python -m venv /path/to/venv
|
63 |
+
```
|
64 |
+
|
65 |
```bash
|
66 |
+
source /path/to/venv/bin/activate
|
67 |
```
|
68 |
+
|
69 |
+
For training and inferencing with Catalan Matcha-TTS you need to compile the provided espeak-ng with the Catalan phonemizer:
|
70 |
|
71 |
```bash
|
72 |
git clone https://github.com/projecte-aina/espeak-ng.git
|
|
|
83 |
pip install unidic-lite
|
84 |
|
85 |
```
|
86 |
+
Install the repository:
|
87 |
+
|
88 |
+
```bash
|
89 |
+
pip install git+https://github.com/langtech-bsc/Matcha-TTS.git@dev-cat
|
90 |
+
|
91 |
+
```
|
92 |
+
|
93 |
|
94 |
### Generate
|
95 |
|