SpeechT5 HiFi-GAN Vocoder

This is the HiFi-GAN vocoder for use with the SpeechT5 text-to-speech and voice conversion models.

SpeechT5 was first released in this repository, original weights. The license used is MIT.

Disclaimer: The team releasing SpeechT5 did not write a model card for this model so this model card has been written by the Hugging Face team.

Citation

BibTeX:

@inproceedings{ao-etal-2022-speecht5,
    title = {{S}peech{T}5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing},
    author = {Ao, Junyi and Wang, Rui and Zhou, Long and Wang, Chengyi and Ren, Shuo and Wu, Yu and Liu, Shujie and Ko, Tom and Li, Qing and Zhang, Yu and Wei, Zhihua and Qian, Yao and Li, Jinyu and Wei, Furu},
    booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
    month = {May},
    year = {2022},
    pages={5723--5738},
}

Downloads last month: 43,105

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for microsoft/speecht5_hifigan

Finetunes

2 models

Quantizations

1 model

Spaces using microsoft/speecht5_hifigan 100

Collection including microsoft/speecht5_hifigan

SpeechT5

Collection

The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks. • 8 items • Updated May 1, 2025 • 28