Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,7 @@ datasets:
|
|
30 |
|
31 |
</details>
|
32 |
|
33 |
-
## Model
|
34 |
|
35 |
**Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
|
36 |
The encoder part is based on a text encoder and a phoneme duration prediction that together predict averaged acoustic features.
|
@@ -40,7 +40,7 @@ In the latter, by replacing 2D CNNs by 1D CNNs, a large reduction in memory cons
|
|
40 |
**Matcha-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
|
41 |
This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
|
42 |
|
43 |
-
## Intended
|
44 |
|
45 |
This model is intended to serve as an acoustic feature generator for multispeaker text-to-speech systems for the Catalan language.
|
46 |
It has been finetuned using a Catalan phonemizer, therefore if the model is used for other languages it may will not produce intelligible samples after mapping
|
@@ -49,7 +49,7 @@ its output into a speech waveform.
|
|
49 |
The quality of the samples can vary depending on the speaker.
|
50 |
This may be due to the sensitivity of the model in learning specific frequencies and also due to the quality of samples for each speaker.
|
51 |
|
52 |
-
## How to
|
53 |
|
54 |
### Installation
|
55 |
|
@@ -85,13 +85,9 @@ pip install git+https://github.com/langtech-bsc/Matcha-TTS.git@dev-cat
|
|
85 |
|
86 |
```
|
87 |
|
88 |
-
|
89 |
### Generate
|
90 |
|
91 |
-
## Training
|
92 |
-
|
93 |
-
### Adaptation
|
94 |
-
|
95 |
|
96 |
### Training data
|
97 |
|
@@ -102,13 +98,11 @@ The model was trained on 2 **Catalan** speech datasets
|
|
102 |
| Festcat | ca | 22 |
|
103 |
| OpenSLR69 | ca | 5 |
|
104 |
|
105 |
-
|
106 |
-
### Framework
|
107 |
|
108 |
|
109 |
## Evaluation
|
110 |
|
111 |
-
### Results
|
112 |
|
113 |
## Citation
|
114 |
|
@@ -125,7 +119,7 @@ If this code contributes to your research, please cite the work:
|
|
125 |
}
|
126 |
```
|
127 |
|
128 |
-
## Additional
|
129 |
|
130 |
### Author
|
131 |
The Language Technologies Unit from Barcelona Supercomputing Center.
|
|
|
30 |
|
31 |
</details>
|
32 |
|
33 |
+
## Model Description
|
34 |
|
35 |
**Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
|
36 |
The encoder part is based on a text encoder and a phoneme duration prediction that together predict averaged acoustic features.
|
|
|
40 |
**Matcha-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
|
41 |
This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
|
42 |
|
43 |
+
## Intended Uses and Limitations
|
44 |
|
45 |
This model is intended to serve as an acoustic feature generator for multispeaker text-to-speech systems for the Catalan language.
|
46 |
It has been finetuned using a Catalan phonemizer, therefore if the model is used for other languages it may will not produce intelligible samples after mapping
|
|
|
49 |
The quality of the samples can vary depending on the speaker.
|
50 |
This may be due to the sensitivity of the model in learning specific frequencies and also due to the quality of samples for each speaker.
|
51 |
|
52 |
+
## How to Use
|
53 |
|
54 |
### Installation
|
55 |
|
|
|
85 |
|
86 |
```
|
87 |
|
|
|
88 |
### Generate
|
89 |
|
90 |
+
## Training Details
|
|
|
|
|
|
|
91 |
|
92 |
### Training data
|
93 |
|
|
|
98 |
| Festcat | ca | 22 |
|
99 |
| OpenSLR69 | ca | 5 |
|
100 |
|
101 |
+
### Training procedure
|
|
|
102 |
|
103 |
|
104 |
## Evaluation
|
105 |
|
|
|
106 |
|
107 |
## Citation
|
108 |
|
|
|
119 |
}
|
120 |
```
|
121 |
|
122 |
+
## Additional Information
|
123 |
|
124 |
### Author
|
125 |
The Language Technologies Unit from Barcelona Supercomputing Center.
|