{% extends "base.html" %} {% block title %}About - TTS Arena{% endblock %} {% block current_page %}About{% endblock %} {% block extra_head %} {% endblock %} {% block content %}

Welcome to TTS Arena 2.0

TTS Arena evaluates leading speech synthesis models in an interactive, community-driven platform. Inspired by LMsys's Chatbot Arena, we've created a space where anyone can compare and rank text-to-speech technologies through direct, side-by-side evaluation.

Our second version now supports conversational models for podcast-like content generation, expanding the arena's scope to reflect the diverse applications of modern speech synthesis.

Motivation

The field of speech synthesis has long lacked reliable methods to measure model quality. Traditional metrics like WER (word error rate) often fail to capture the nuances of natural speech, while subjective measures such as MOS (mean opinion score) typically involve small-scale experiments with limited participants.

TTS Arena addresses these limitations by inviting the entire community to participate in the evaluation process, making both the opportunity to rank models and the resulting insights accessible to everyone.

How The Arena Works

The concept is straightforward: enter text that will be synthesized by two competing models. After listening to both samples, vote for the one that sounds more natural and engaging. To prevent bias, model names are revealed only after your vote is submitted.

Frequently Asked Questions

What happened to the TTS Arena V1 leaderboard?
The TTS Arena V1 leaderboard is now deprecated. While you can no longer vote on it, the results and leaderboard are still available for reference at TTS Arena V1. The leaderboard is static and will not change.
How are models ranked in TTS Arena?
Models are ranked using an Elo rating system, similar to chess rankings. When you vote for a model, its rating increases while the other model's rating decreases. The amount of change depends on the current ratings of both models.
Is the TTS Arena V2 leaderboard affected by votes from V1?
No, the TTS Arena V2 leaderboard is a completely fresh start. Votes from V1 do not affect the V2 leaderboard in any way. All models in V2 start with a clean slate.
Can I suggest a model to be added to the arena?
Yes! We welcome suggestions for new models. Please reach out to us through the Hugging Face community or create an issue in our GitHub repository. If you are developing a new model and wish for it to be added anonymously for pre-release evaluation, please reach out to us to discuss.
How can I contribute to the project?
You can contribute by voting on models, suggesting improvements, reporting bugs, or even contributing code. Check our GitHub repository for more information on how to get involved.
What's new in TTS Arena 2.0?
TTS Arena 2.0 introduces support for conversational models (for podcast-like content), improved UI/UX, and a more robust backend infrastructure for handling more models and votes.
Do I need to login to use TTS Arena?
Login is optional and not required to vote. If you choose to login (with Hugging Face), texts you enter will be associated with your account, and you'll have access to a personal leaderboard showing the models you favor the most.

Citation

If you use TTS Arena in your research, please cite it as follows:

@misc{tts-arena-v2, title = {TTS Arena 2.0: Benchmarking Text-to-Speech Models in the Wild}, author = {mrfakename and Srivastav, Vaibhav and Fourrier, Clémentine and Pouget, Lucain and Lacombe, Yoach and main and Gandhi, Sanchit and Passos, Apolinário and Cuenca, Pedro}, year = 2025, publisher = {Hugging Face}, howpublished = "\url{https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2}" }

Credits

Thank you to the following individuals who helped make this project possible:

Vaibhav (VB) Srivastav
Clémentine Fourrier
Lucain Pouget
Yoach Lacombe
Main Horse
Sanchit Gandhi
Apolinário Passos
Pedro Cuenca

Privacy Statement

We may store text you enter and generated audio. If you are logged in, we may associate your votes with your Hugging Face username. You agree that we may collect, share, and/or publish any data you input for research and/or commercial purposes.

License

Generated audio clips cannot be redistributed and may be used for personal, non-commercial use only. The code for the Arena is licensed under the Zlib license. Random sentences are sourced from a filtered subset of the Harvard Sentences.

{% endblock %}