projecte-aina
/

matxa-tts-cat-multispeaker

acoustic modelling

Model card Files Files and versions Community

Baybars commited on Apr 18, 2024

Commit

651d66f

·

verified ·

1 Parent(s): e96083b

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ datasets:
 - projecte-aina/openslr-slr69-ca-trimmed-denoised
 ---
-# Matcha-TTS Catalan Multispeaker
 ## Table of Contents
 <details>
@@ -32,12 +32,12 @@ datasets:
 ## Model Description
-**Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
 The encoder part is based on a text encoder and a phoneme duration prediction that together predict averaged acoustic features.
 And the decoder has essentially a U-Net backbone inspired by [Grad-TTS](https://arxiv.org/pdf/2105.06337.pdf), which is based on the Transformer architecture.
 In the latter, by replacing 2D CNNs by 1D CNNs, a large reduction in memory consumption and fast synthesis is achieved.
-**Matcha-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
 This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
 ## Intended Uses and Limitations
@@ -64,7 +64,7 @@ python -m venv /path/to/venv
 source /path/to/venv/bin/activate
 ```
-For training and inferencing with Catalan Matcha-TTS you need to compile the provided espeak-ng with the Catalan phonemizer:
 ```bash
 git clone https://github.com/projecte-aina/espeak-ng.git
@@ -97,8 +97,8 @@ pip install -e .
 #### PyTorch
-Speech end-to-end inference can be done together with **Catalan Matcha-TTS**.
-Both models (Catalan Matcha-TTS and Vocos) are loaded remotely from the HF hub.
 First, export the following environment variables to include the installed espeak-ng version:

 - projecte-aina/openslr-slr69-ca-trimmed-denoised
 ---
+# 🍵 Matxa-TTS Catalan Multispeaker
 ## Table of Contents
 <details>
 ## Model Description
+🍵 **Matxa-TTS** is based on **Matcha-TTS** that is an encoder-decoder architecture designed for fast acoustic modelling in TTS.
 The encoder part is based on a text encoder and a phoneme duration prediction that together predict averaged acoustic features.
 And the decoder has essentially a U-Net backbone inspired by [Grad-TTS](https://arxiv.org/pdf/2105.06337.pdf), which is based on the Transformer architecture.
 In the latter, by replacing 2D CNNs by 1D CNNs, a large reduction in memory consumption and fast synthesis is achieved.
+**Matxa-TTS** is a non-autorregressive model trained with optimal-transport conditional flow matching (OT-CFM).
 This yields an ODE-based decoder capable of generating high output quality in fewer synthesis steps than models trained using score matching.
 ## Intended Uses and Limitations
 source /path/to/venv/bin/activate
 ```
+For training and inferencing with Catalan Matxa-TTS you need to compile the provided espeak-ng with the Catalan phonemizer:
 ```bash
 git clone https://github.com/projecte-aina/espeak-ng.git
 #### PyTorch
+Speech end-to-end inference can be done together with **Catalan Matxa-TTS**.
+Both models (Catalan Matxa-TTS and alVoCat) are loaded remotely from the HF hub.
 First, export the following environment variables to include the installed espeak-ng version: