Multi-Speaker VITS Model for Hausa
This is a multi-speaker extension of the MMS-TTS Hausa model from Meta.
Model Details
- Base model: facebook/mms-tts-hau
- Number of speakers: 10
- Model class: MultiSpeakerVITS
- Language: Hausa (hau)
- Task: Text-to-Speech (TTS)
Model Architecture
This model extends the original MMS-TTS Hausa model with multi-speaker capabilities by:
- Adding speaker embeddings for 10 different speakers
- Conditioning the text encoder output with speaker information
- Maintaining compatibility with the original VITS architecture
Usage
import torch
from transformers import VitsModel, VitsTokenizer
# Load the base model and tokenizer
base_model = VitsModel.from_pretrained("facebook/mms-tts-hau")
tokenizer = VitsTokenizer.from_pretrained("facebook/mms-tts-hau")
# Load the multi-speaker checkpoint
checkpoint = torch.load("multispeaker_vits_template.pth")
# Define the MultiSpeakerVITS class (copy from the original code)
class MultiSpeakerVITS(torch.nn.Module):
# ... (copy the class definition from the original code)
pass
# Create and load the multi-speaker model
ms_model = MultiSpeakerVITS(base_model, n_speakers=10)
ms_model.load_state_dict(checkpoint["model_state"])
ms_model.eval()
# Example usage
text = "Sannu, ina kwana?" # "Hello, how are you?" in Hausa
inputs = tokenizer(text, return_tensors="pt")
speaker_id = torch.tensor([0]) # Choose speaker 0-9
with torch.no_grad():
output = ms_model(
input_ids=inputs["input_ids"],
attention_mask=inputs.get("attention_mask"),
speaker_ids=speaker_id
)
Training
This is a template model with initialized weights. To use it effectively, you'll need to:
- Fine-tune on multi-speaker Hausa data: Train the speaker embeddings and optionally fine-tune the base model
- Prepare speaker-labeled dataset: Each audio sample should be labeled with a speaker ID (0 to 9)
- Training loop: Implement a training loop that uses both text and speaker_ids as inputs
Files
multispeaker_vits_template.pth
: PyTorch checkpoint containing model weightsconfig.json
: Model configuration and metadataREADME.md
: This documentation
Citation
@article{pratap2023mms,
title={Scaling Speech Technology to 1,000+ Languages},
author={Pratap, Vineel and Tjandrawati, Andros and Conneau, Alexis and others},
journal={arXiv preprint arXiv:2305.13516},
year={2023}
}
License
This model is based on the MMS-TTS model and follows the same licensing terms.
- Downloads last month
- 20
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for suleiman2003/mms-hausa-multispeaker-template
Base model
facebook/mms-tts-hau