Update README.md
Browse files
README.md
CHANGED
|
@@ -30,7 +30,7 @@ datasets:
|
|
| 30 |
|
| 31 |
</details>
|
| 32 |
|
| 33 |
-
## Model
|
| 34 |
|
| 35 |
**Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
|
| 36 |
The encoder part is based on a text encoder and a phoneme duration prediction that together predict averaged acoustic features.
|
|
@@ -40,7 +40,7 @@ In the latter, by replacing 2D CNNs by 1D CNNs, a large reduction in memory cons
|
|
| 40 |
**Matcha-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
|
| 41 |
This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
|
| 42 |
|
| 43 |
-
## Intended
|
| 44 |
|
| 45 |
This model is intended to serve as an acoustic feature generator for multispeaker text-to-speech systems for the Catalan language.
|
| 46 |
It has been finetuned using a Catalan phonemizer, therefore if the model is used for other languages it may will not produce intelligible samples after mapping
|
|
@@ -49,7 +49,7 @@ its output into a speech waveform.
|
|
| 49 |
The quality of the samples can vary depending on the speaker.
|
| 50 |
This may be due to the sensitivity of the model in learning specific frequencies and also due to the quality of samples for each speaker.
|
| 51 |
|
| 52 |
-
## How to
|
| 53 |
|
| 54 |
### Installation
|
| 55 |
|
|
@@ -85,13 +85,9 @@ pip install git+https://github.com/langtech-bsc/Matcha-TTS.git@dev-cat
|
|
| 85 |
|
| 86 |
```
|
| 87 |
|
| 88 |
-
|
| 89 |
### Generate
|
| 90 |
|
| 91 |
-
## Training
|
| 92 |
-
|
| 93 |
-
### Adaptation
|
| 94 |
-
|
| 95 |
|
| 96 |
### Training data
|
| 97 |
|
|
@@ -102,13 +98,11 @@ The model was trained on 2 **Catalan** speech datasets
|
|
| 102 |
| Festcat | ca | 22 |
|
| 103 |
| OpenSLR69 | ca | 5 |
|
| 104 |
|
| 105 |
-
|
| 106 |
-
### Framework
|
| 107 |
|
| 108 |
|
| 109 |
## Evaluation
|
| 110 |
|
| 111 |
-
### Results
|
| 112 |
|
| 113 |
## Citation
|
| 114 |
|
|
@@ -125,7 +119,7 @@ If this code contributes to your research, please cite the work:
|
|
| 125 |
}
|
| 126 |
```
|
| 127 |
|
| 128 |
-
## Additional
|
| 129 |
|
| 130 |
### Author
|
| 131 |
The Language Technologies Unit from Barcelona Supercomputing Center.
|
|
|
|
| 30 |
|
| 31 |
</details>
|
| 32 |
|
| 33 |
+
## Model Description
|
| 34 |
|
| 35 |
**Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
|
| 36 |
The encoder part is based on a text encoder and a phoneme duration prediction that together predict averaged acoustic features.
|
|
|
|
| 40 |
**Matcha-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
|
| 41 |
This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
|
| 42 |
|
| 43 |
+
## Intended Uses and Limitations
|
| 44 |
|
| 45 |
This model is intended to serve as an acoustic feature generator for multispeaker text-to-speech systems for the Catalan language.
|
| 46 |
It has been finetuned using a Catalan phonemizer, therefore if the model is used for other languages it may will not produce intelligible samples after mapping
|
|
|
|
| 49 |
The quality of the samples can vary depending on the speaker.
|
| 50 |
This may be due to the sensitivity of the model in learning specific frequencies and also due to the quality of samples for each speaker.
|
| 51 |
|
| 52 |
+
## How to Use
|
| 53 |
|
| 54 |
### Installation
|
| 55 |
|
|
|
|
| 85 |
|
| 86 |
```
|
| 87 |
|
|
|
|
| 88 |
### Generate
|
| 89 |
|
| 90 |
+
## Training Details
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
### Training data
|
| 93 |
|
|
|
|
| 98 |
| Festcat | ca | 22 |
|
| 99 |
| OpenSLR69 | ca | 5 |
|
| 100 |
|
| 101 |
+
### Training procedure
|
|
|
|
| 102 |
|
| 103 |
|
| 104 |
## Evaluation
|
| 105 |
|
|
|
|
| 106 |
|
| 107 |
## Citation
|
| 108 |
|
|
|
|
| 119 |
}
|
| 120 |
```
|
| 121 |
|
| 122 |
+
## Additional Information
|
| 123 |
|
| 124 |
### Author
|
| 125 |
The Language Technologies Unit from Barcelona Supercomputing Center.
|