projecte-aina
/

matxa-tts-cat-multispeaker

acoustic modelling

Model card Files Files and versions

AlexK-PL commited on Mar 29, 2024

Commit

e69747c

·

verified ·

1 Parent(s): 6a5a7ed

Update README.md

Files changed (1) hide show

README.md +10 -8

README.md CHANGED Viewed

@@ -4,11 +4,10 @@ language:
 licence:
 - apache-2.0
 tags:
-- matcha tts
 - speech
-- text-to-speech
 - multispeaker
-- catalan
 pipeline_tag: text-to-speech
 datasets:
 - projecte-aina/festcat_trimmed_denoised
@@ -33,11 +32,11 @@ datasets:
 ## Model description
-Matcha-TTS is an encoder-decoder architecture designed for fast acoustic modelling in TTS. The encoder predicts phoneme durations and its mean feature vectors.
 And the decoder is essentially a U-Net inspired by Grad-TTS, that is based on Transformers architecture combined
 with 1D instead of 2D CNNs, making a high reduction on memory consumption and speedy synthesis.
-Matcha-TTS is non-autorregressive and is trained using optimal-transport conditional flow matching (OT-CFM).
 This yields an ODE-based decoder capable of high output quality in fewer synthesis steps than models trained using score matching.
 ## Intended uses and limitations
@@ -70,10 +69,13 @@ print(f"Result: {generation[0]['generated_text']}")
 ```
 ## Limitations and bias
-At the time of submission, no measures have been taken to estimate the bias and toxicity embedded in the model.
-However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques
-on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated.
 ## Training

 licence:
 - apache-2.0
 tags:
+- matcha-tts
+- acoustic modelling
 - speech
 - multispeaker
 pipeline_tag: text-to-speech
 datasets:
 - projecte-aina/festcat_trimmed_denoised
 ## Model description
+**Matcha-TTS** is an encoder-decoder architecture designed for fast acoustic modelling in TTS. The encoder predicts phoneme durations and its mean feature vectors.
 And the decoder is essentially a U-Net inspired by Grad-TTS, that is based on Transformers architecture combined
 with 1D instead of 2D CNNs, making a high reduction on memory consumption and speedy synthesis.
+**Matcha-TTS** is non-autorregressive and is trained using optimal-transport conditional flow matching (OT-CFM).
 This yields an ODE-based decoder capable of high output quality in fewer synthesis steps than models trained using score matching.
 ## Intended uses and limitations
 ```
 ## Limitations and bias
+This model is intended to serve as an acoustic feature generator for multispeaker text-to-speech systems for the Catalan language.
+It has been finetuned using a Catalan phonemizer, therefore if the model is used in other languages it may will not produce intelligible samples after converting its output
+into a speech waveform.
+The quality of the samples may vary depending on the speaker. This is due to the sensitivity of the model in learning specific frequencies and also in the samples
+used for each speaker.
 ## Training