{% extends "base.html" %} {% block title %}About - TTS Arena{% endblock %} {% block current_page %}About{% endblock %} {% block extra_head %} {% endblock %} {% block content %}
TTS Arena evaluates leading speech synthesis models in an interactive, community-driven platform. Inspired by LMsys's Chatbot Arena, we've created a space where anyone can compare and rank text-to-speech technologies through direct, side-by-side evaluation.
Our second version now supports conversational models for podcast-like content generation, expanding the arena's scope to reflect the diverse applications of modern speech synthesis.
The field of speech synthesis has long lacked reliable methods to measure model quality. Traditional metrics like WER (word error rate) often fail to capture the nuances of natural speech, while subjective measures such as MOS (mean opinion score) typically involve small-scale experiments with limited participants.
TTS Arena addresses these limitations by inviting the entire community to participate in the evaluation process, making both the opportunity to rank models and the resulting insights accessible to everyone.
The concept is straightforward: enter text that will be synthesized by two competing models. After listening to both samples, vote for the one that sounds more natural and engaging. To prevent bias, model names are revealed only after your vote is submitted.
If you use TTS Arena in your research, please cite it as follows:
Thank you to the following individuals who helped make this project possible:
We may store text you enter and generated audio. If you are logged in, we may associate your votes with your Hugging Face username. You agree that we may collect, share, and/or publish any data you input for research and/or commercial purposes.
Generated audio clips cannot be redistributed and may be used for personal, non-commercial use only. The code for the Arena is licensed under the Zlib license. Random sentences are sourced from a filtered subset of the Harvard Sentences.