Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ datasets:
|
|
14 |
- projecte-aina/openslr-slr69-ca-trimmed-denoised
|
15 |
---
|
16 |
|
17 |
-
#
|
18 |
|
19 |
## Table of Contents
|
20 |
<details>
|
@@ -32,12 +32,12 @@ datasets:
|
|
32 |
|
33 |
## Model Description
|
34 |
|
35 |
-
**Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
|
36 |
The encoder part is based on a text encoder and a phoneme duration prediction that together predict averaged acoustic features.
|
37 |
And the decoder has essentially a U-Net backbone inspired by [Grad-TTS](https://arxiv.org/pdf/2105.06337.pdf), which is based on the Transformer architecture.
|
38 |
In the latter, by replacing 2D CNNs by 1D CNNs, a large reduction in memory consumption and fast synthesis is achieved.
|
39 |
|
40 |
-
**
|
41 |
This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
|
42 |
|
43 |
## Intended Uses and Limitations
|
@@ -64,7 +64,7 @@ python -m venv /path/to/venv
|
|
64 |
source /path/to/venv/bin/activate
|
65 |
```
|
66 |
|
67 |
-
For training and inferencing with Catalan
|
68 |
```bash
|
69 |
git clone https://github.com/projecte-aina/espeak-ng.git
|
70 |
|
@@ -97,8 +97,8 @@ pip install -e .
|
|
97 |
|
98 |
#### PyTorch
|
99 |
|
100 |
-
Speech end-to-end inference can be done together with **Catalan
|
101 |
-
Both models (Catalan
|
102 |
|
103 |
First, export the following environment variables to include the installed espeak-ng version:
|
104 |
|
|
|
14 |
- projecte-aina/openslr-slr69-ca-trimmed-denoised
|
15 |
---
|
16 |
|
17 |
+
# 🍵 Matxa-TTS Catalan Multispeaker
|
18 |
|
19 |
## Table of Contents
|
20 |
<details>
|
|
|
32 |
|
33 |
## Model Description
|
34 |
|
35 |
+
🍵 **Matxa-TTS** is based on **Matcha-TTS** that is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
|
36 |
The encoder part is based on a text encoder and a phoneme duration prediction that together predict averaged acoustic features.
|
37 |
And the decoder has essentially a U-Net backbone inspired by [Grad-TTS](https://arxiv.org/pdf/2105.06337.pdf), which is based on the Transformer architecture.
|
38 |
In the latter, by replacing 2D CNNs by 1D CNNs, a large reduction in memory consumption and fast synthesis is achieved.
|
39 |
|
40 |
+
**Matxa-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
|
41 |
This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
|
42 |
|
43 |
## Intended Uses and Limitations
|
|
|
64 |
source /path/to/venv/bin/activate
|
65 |
```
|
66 |
|
67 |
+
For training and inferencing with Catalan Matxa-TTS you need to compile the provided espeak-ng with the Catalan phonemizer:
|
68 |
```bash
|
69 |
git clone https://github.com/projecte-aina/espeak-ng.git
|
70 |
|
|
|
97 |
|
98 |
#### PyTorch
|
99 |
|
100 |
+
Speech end-to-end inference can be done together with **Catalan Matxa-TTS**.
|
101 |
+
Both models (Catalan Matxa-TTS and alVoCat) are loaded remotely from the HF hub.
|
102 |
|
103 |
First, export the following environment variables to include the installed espeak-ng version:
|
104 |
|