Spaces:
Running
Running
title: Expressive TTS Arena | |
emoji: π€ | |
colorFrom: indigo | |
colorTo: pink | |
sdk: docker | |
app_file: src/main.py | |
python_version: "3.11" | |
pinned: true | |
license: mit | |
<div align="center"> | |
<img src="https://storage.googleapis.com/hume-public-logos/hume/hume-banner.png"> | |
<h1>Expressive TTS Arena</h1> | |
<p> | |
<strong> | |
A web application for comparing and evaluating the expressiveness of different text-to-speech models | |
</strong> | |
</p> | |
</div> | |
## Overview | |
Expressive TTS Arena is an open-source web application for evaluating the expressiveness of voice generation and speech synthesis from different text-to-speech providers. | |
For support or to join the conversation, visit our [Discord](https://discord.com/invite/humeai). | |
## Prerequisites | |
- [Python >=3.11.11](https://www.python.org/downloads/) | |
- [pip >=25.0](https://pypi.org/project/pip/) | |
- [uv >=0.5.29](https://github.com/astral-sh/uv) | |
- [Postgres](https://www.postgresql.org/download/) | |
- API keys for Hume AI, Anthropic, OpenAI, and ElevenLabs | |
## Project Structure | |
``` | |
Expressive TTS Arena/ | |
βββ public/ | |
βββ src/ | |
β βββ common/ | |
β β βββ __init__.py | |
β β βββ common_types.py # Application-wide custom type aliases and definitions. | |
β β βββ config.py # Manages application config (Singleton) loaded from env vars. | |
β β βββ constants.py # Application-wide constant values. | |
β β βββ utils.py # General-purpose utility functions used across modules. | |
β βββ core/ | |
β β βββ __init__.py | |
β β βββ tts_service.py # Service handling Text-to-Speech provider selection and API calls. | |
β β βββ voting_service.py # Service managing database operations for votes and leaderboards. | |
β βββ database/ # Database access layer using SQLAlchemy. | |
β β βββ __init__.py | |
β β βββ crud.py # Data Access Objects (DAO) / CRUD operations for database models. | |
β β βββ database.py # Database connection setup (engine, session management). | |
β β βββ models.py # SQLAlchemy ORM models defining database tables. | |
β βββ frontend/ | |
β β βββ components/ | |
β β β β βββ __init__.py | |
β β β β βββ arena.py # UI definition and logic for the 'Arena' tab. | |
β β β β βββ leaderboard.py # UI definition and logic for the 'Leaderboard' tab. | |
β β βββ __init__.py | |
β β βββ frontend.py # Main Gradio application class; orchestrates UI components and layout. | |
β βββ integrations/ # Modules for interacting with external third-party APIs. | |
β β βββ __init__.py | |
β β βββ anthropic_api.py # Integration logic for the Anthropic API. | |
β β βββ elevenlabs_api.py # Integration logic for the ElevenLabs API. | |
β β βββ hume_api.py # Integration logic for the Hume API. | |
β βββ middleware/ | |
β β βββ __init__.py | |
β β βββ meta_tag_injection.py # Middleware for injecting custom HTML meta tags into the Gradio page. | |
β βββ scripts/ | |
β β βββ __init__.py | |
β β βββ init_db.py # Script to create database tables based on models. | |
β β βββ test_db.py # Script for testing the database connection configuration. | |
β βββ __init__.py | |
β βββ main.py # Main script to configure and run the Gradio application. | |
βββ static/ | |
β βββ audio/ # Temporary storage for generated audio files served to the UI. | |
β βββ css/ | |
β β βββ styles.css # Custom CSS overrides and styling for the Gradio UI. | |
βββ .dockerignore | |
βββ .env.example | |
βββ .gitignore | |
βββ .pre-commit-config.yaml | |
βββ Dockerfile | |
βββ LICENSE.txt | |
βββ pyproject.toml | |
βββ README.md | |
βββ uv.lock | |
``` | |
## Installation | |
1. This project uses the [uv](https://docs.astral.sh/uv/) package manager. Follow the installation instructions for your platform [here](https://docs.astral.sh/uv/getting-started/installation/). | |
2. Configure environment variables: | |
- Create a `.env` file based on `.env.example` | |
- Add your API keys: | |
```txt | |
HUME_API_KEY=YOUR_HUME_API_KEY | |
ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY | |
ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY | |
OPENAI_API_KEY=YOUR_OPENAI_API_KEY | |
``` | |
3. Run the application: | |
Standard | |
```sh | |
uv run python -m src.main | |
``` | |
With hot-reloading | |
```sh | |
uv run watchfiles "python -m src.main" src | |
``` | |
4. Test the application by navigating to the the localhost URL in your browser (e.g. `localhost:7860` or `http://127.0.0.1:7860`) | |
5. (Optional) If contributing, install pre-commit hook for automatic linting, formatting, and type-checking: | |
```sh | |
uv run pre-commit install | |
``` | |
## User Flow | |
1. Select a sample character, or input a custom character description and click **"Generate Text"**, to generate your text input. | |
2. Click the **"Synthesize Speech"** button to synthesize two TTS outputs based on your text and character description. | |
3. Listen to both audio samples to compare their expressiveness. | |
4. Vote for the most expressive result by clicking either **"Select Option A"** or **"Select Option B"**. | |
## License | |
This project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details. | |