Text-to-Speech
PyTorch
ONNX
Catalan
matcha-tts
acoustic modelling
speech
multispeaker
AlexK-PL commited on
Commit
f08fdc4
1 Parent(s): b780397

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -33,10 +33,11 @@ datasets:
33
 
34
  ## Model description
35
 
36
- Matcha-TTS is an encoder-decoder architecture designed for fast acoustic modelling in TTS. The encoder side is inspired by previous works (Grad-TTS and Glow-TTS)
37
- modelling alignment with Monotonic Alignment Search (MOS). The decoder is essentially a U-Net inspired by Grad-TTS based on Transformers architecture combined with 1D CNNs,
38
- making a high reduction on memory consumption while increasing synthesis speed. Matcha-TTS is probabilistic, non-autorregressive and is trained using optimal-transport
39
- conditional flow matching (OT-CFM). This yields an ODE-based decoder capable of high output quality in fewer synthesis steps than models trained using score matching.
 
40
 
41
  ## Intended uses and limitations
42
 
 
33
 
34
  ## Model description
35
 
36
+ Matcha-TTS is an encoder-decoder architecture designed for fast acoustic modelling in TTS. The encoder predicts phoneme durations and its mean feature vectors
37
+ modelling alignment with Monotonic Alignment Search (MOS). And the decoder is essentially a U-Net inspired by Grad-TTS, that is based on Transformers architecture combined
38
+ with 1D instead of 2D CNNs, making a high reduction on memory consumption and speedy synthesis.
39
+ Matcha-TTS is non-autorregressive and is trained using optimal-transport conditional flow matching (OT-CFM).
40
+ This yields an ODE-based decoder capable of high output quality in fewer synthesis steps than models trained using score matching.
41
 
42
  ## Intended uses and limitations
43