Text-to-Speech (TTS) with VITS trained on Kiswahili and Luganda Common Voice

This repository provides all the necessary tools for Text-to-Speech (TTS) with Coqui TTS using a VITS fine-tuned on Kiswahili and Luganda Common Voice v13 from six speakers of a similar intonation.

The pre-trained model takes in as input a text and produces a waveform/audio in output.

How to Synthesize Speech using our models

First, you need to install TTS

pip install TTS

Perform Text-to-Speech (TTS)

from TTS.utils.synthesizer import Synthesizer


synthesizer = Synthesizer(
        "<model checkpoint path>",
        "<model configuration file>",
        None,
        None,
        None,
        None,
        None,
        None,
        None,
    )

sentence_to_synthesize = "Your Kiswahili or Luganda sentence here"
if sentence_to_synthesize:
    print(sentence_to_synthesize)
    wav = synthesizer.tts(sentence_to_synthesize, None, None, None)
    location = "output.wav"  # Choose a desired name for the output file
    synthesizer.save_wav(wav, location)

Limitations

We do not provide any warranty on the performance achieved by this model when used on other datasets.

Citing

Please, cite our work if you use our models for your research or business.

@inproceedings{buildingTTS,
  title={Building a Luganda Text-to-Speech Model from Crowdsourced Data},
  author={Kagumire, Sulaiman and Katumba, Andrew and Nakatumba-Nabende, Joyce and Quinn, John},
  booktitle={5th Workshop on African Natural Language Processing},
  year ={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Dataset used to train marconilab/VITS-commonvoice-females