Text-to-Speech
PyTorch
ONNX
Catalan
matcha-tts
acoustic modelling
speech
multispeaker
AlexK-PL commited on
Commit
5c145f9
1 Parent(s): d79d134

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -5
README.md CHANGED
@@ -32,9 +32,10 @@ datasets:
32
 
33
  ## Model description
34
 
35
- **Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS. The encoder predicts phoneme durations and their averaged acoustic features.
36
- The decoder backbone is essentially a U-Net inspired by [Grad-TTS](https://arxiv.org/pdf/2105.06337.pdf) based on Transformers architecture. By replacing 2D CNNs by 1D CNNs,
37
- a large reduction in memory consumption and fast synthesis is achieved.
 
38
 
39
  **Matcha-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
40
  This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
@@ -52,10 +53,20 @@ This may be due to the sensitivity of the model in learning specific frequencies
52
 
53
  ### Installation
54
 
 
 
 
 
 
 
 
 
 
55
  ```bash
56
- pip install git+https://github.com/langtech-bsc/vocos.git@matcha
57
  ```
58
- You need to install the Catalan phonemizer version of espeak-ng:
 
59
 
60
  ```bash
61
  git clone https://github.com/projecte-aina/espeak-ng.git
@@ -72,6 +83,13 @@ pip install mecab-python3
72
  pip install unidic-lite
73
 
74
  ```
 
 
 
 
 
 
 
75
 
76
  ### Generate
77
 
 
32
 
33
  ## Model description
34
 
35
+ **Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
36
+ The encoder part is based on a text encoder and a phoneme duration prediction that together predict averaged acoustic features.
37
+ And the decoder has essentially a U-Net backbone inspired by [Grad-TTS](https://arxiv.org/pdf/2105.06337.pdf), which is based on the Transformer architecture.
38
+ In the latter, by replacing 2D CNNs by 1D CNNs, a large reduction in memory consumption and fast synthesis is achieved.
39
 
40
  **Matcha-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
41
  This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
 
53
 
54
  ### Installation
55
 
56
+ This model has been trained using the espeak-ng open source text-to-speech software.
57
+ The espeak-ng containing the Catalan phonemizer can be found [here](https://github.com/projecte-aina/espeak-ng)
58
+
59
+ Create a virtual environment:
60
+
61
+ ```bash
62
+ python -m venv /path/to/venv
63
+ ```
64
+
65
  ```bash
66
+ source /path/to/venv/bin/activate
67
  ```
68
+
69
+ For training and inferencing with Catalan Matcha-TTS you need to compile the provided espeak-ng with the Catalan phonemizer:
70
 
71
  ```bash
72
  git clone https://github.com/projecte-aina/espeak-ng.git
 
83
  pip install unidic-lite
84
 
85
  ```
86
+ Install the repository:
87
+
88
+ ```bash
89
+ pip install git+https://github.com/langtech-bsc/Matcha-TTS.git@dev-cat
90
+
91
+ ```
92
+
93
 
94
  ### Generate
95