File size: 4,467 Bytes
432a474
 
 
 
d2ae73d
432a474
 
 
 
 
 
 
8ba4308
 
6680f24
87ff28a
633a175
8ba4308
87ff28a
 
 
9100090
633a175
 
 
87ff28a
 
 
9751248
 
 
cb57d96
9751248
87ff28a
e898abd
9100090
e898abd
 
de305ed
 
0f77dec
5fae21a
0f77dec
 
 
e898abd
 
 
 
 
0f77dec
 
 
 
 
e898abd
 
048c3fc
fb9daf9
 
e898abd
b850013
 
de305ed
 
dfbb840
e898abd
 
 
0f77dec
 
 
 
 
e898abd
 
 
87ff28a
f420a37
8ba4308
f420a37
87ff28a
 
 
f420a37
87ff28a
 
 
 
 
f420a37
557e7ca
 
 
1ed6720
557e7ca
 
 
8ba4308
1ed6720
87ff28a
 
f420a37
 
633a175
f420a37
 
 
8ba4308
87ff28a
 
633a175
 
 
 
87ff28a
 
9100090
f420a37
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
title: Expressive TTS Arena
emoji: 🎀
colorFrom: indigo
colorTo: pink
sdk: docker
app_file: src/main.py
python_version: "3.11"
pinned: true
license: mit
---

<div align="center">
    <img src="https://storage.googleapis.com/hume-public-logos/hume/hume-banner.png">
    <h1>Expressive TTS Arena</h1>
    <p>
        <strong> A web application for comparing and evaluating the expressiveness of different text-to-speech models </strong>
    </p>
</div>

## Overview

Expressive TTS Arena is an open-source web application for evaluating the expressiveness of voice generation and speech synthesis from different text-to-speech providers, including Hume AI and Elevenlabs.

For support or to join the conversation, visit our [Discord](https://discord.com/invite/humeai).

## Prerequisites

- [Python >=3.11.11](https://www.python.org/downloads/)
- [pip >=25.0](https://pypi.org/project/pip/)
- [uv >=0.5.29](https://github.com/astral-sh/uv)
- [Postgres](https://www.postgresql.org/download/)
- API keys for Hume AI, Anthropic, and ElevenLabs

## Project Structure

```
Expressive TTS Arena/
β”œβ”€β”€ public/                     # Directory for public assets
β”œβ”€β”€ src/                        
β”‚   β”œβ”€β”€ database/
β”‚   β”‚   β”œβ”€β”€ __init__.py         # Makes database a package; expose ORM methods
β”‚   β”‚   β”œβ”€β”€ crud.py             # Defines operations for interacting with database
β”‚   β”‚   β”œβ”€β”€ database.py         # Sets up SQLAlchemy database connection
β”‚   β”‚   └── models.py           # SQLAlchemy database models
β”‚   β”œβ”€β”€ integrations/
β”‚   β”‚   β”œβ”€β”€ __init__.py         # Makes integrations a package; exposes API clients
β”‚   β”‚   β”œβ”€β”€ anthropic_api.py    # Anthropic API integration
β”‚   β”‚   β”œβ”€β”€ elevenlabs_api.py   # ElevenLabs API integration
β”‚   β”‚   └── hume_api.py         # Hume API integration
β”‚   β”œβ”€β”€ scripts/
β”‚   β”‚   β”œβ”€β”€ __init__.py         # Makes scripts a package
β”‚   β”‚   β”œβ”€β”€ init_db.py          # Script for initializing database
β”‚   β”‚   β”œβ”€β”€ test_db.py          # Script for testing database connection
β”‚   β”œβ”€β”€ __init__.py             # Makes src a package
β”‚   β”œβ”€β”€ config.py               # Global config and logger setup
β”‚   β”œβ”€β”€ constants.py            # Global constants
β”‚   β”œβ”€β”€ custom_types.py         # Global custom types
β”‚   β”œβ”€β”€ frontend.py             # Gradio UI components
β”‚   β”œβ”€β”€ main.py                 # Entry file
β”‚   └── utils.py                # Utility functions
│── static/
β”‚   β”œβ”€β”€ audio/                  # Directory for storing generated audio files
β”‚   β”œβ”€β”€ css/
β”‚   β”‚   β”œβ”€β”€ styles.css          # Defines custom css
β”œβ”€β”€ .dockerignore
β”œβ”€β”€ .env.example
β”œβ”€β”€ .gitignore
β”œβ”€β”€ .pre-commit-config.yaml
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ LICENSE.txt
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ README.md
β”œβ”€β”€ uv.lock
```

## Installation

1. This project uses the [uv](https://docs.astral.sh/uv/) package manager. Follow the installation instructions for your platform [here](https://docs.astral.sh/uv/getting-started/installation/).

2. Configure environment variables:
    - Create a `.env` file based on `.env.example`
    - Add your API keys:

    ```txt
    HUME_API_KEY=YOUR_HUME_API_KEY
    ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY
    ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY
    ```

3. Run the application:

    Standard
    ```sh
    uv run python -m src.main
    ```

    With hot-reloading
    ```sh
    uv run watchfiles "python -m src.main" src
    ```

4. Test the application by navigating to the the localhost URL in your browser (e.g. `localhost:7860` or `http://127.0.0.1:7860`)

5. (Optional) If contributing, install pre-commit hook for automatic linting, formatting, and type-checking:
    ```sh
    uv run pre-commit install
    ```

## User Flow

1. Select a sample character, or input a custom character description and click **"Generate Text"**, to generate your text input.
2. Click the **"Synthesize Speech"** button to synthesize two TTS outputs based on your text and character description.
3. Listen to both audio samples to compare their expressiveness.
4. Vote for the most expressive result by clicking either **"Select Option A"** or **"Select Option B"**.

## License

This project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details.