I created an API wrapper with web UI

#13
by devnen - opened

Seriously impressive work. The model quality is outstanding, especially for a 3-month effort from scratch.

Seeing some discussions about running Dia locally, I wanted to share a project I put together quickly that might make it easier to get started:
https://github.com/devnen/Dia-TTS-Server

It's an API server that wraps the Dia model. Getting it set up is pretty straightforward, just a standard pip install -r requirements.txt that works on Windows or Linux. It automatically grabs the model from Hugging Face. There's a simple web UI for generating speech, adjusting the parameters, and testing out voice cloning. For integration, it has both an OpenAI-compatible API endpoint and a custom one if you need full control. Plus, it runs on either CUDA GPUs or just the CPU.

screenshot-d-s.png

The goal was to create a simple way to run and experiment with the model without needing to piece together the example scripts yourself.
Hope you find it useful!

devnen changed discussion status to closed
devnen changed discussion status to open

thanks for this. Is there a limit to the characters I can input... I have multiple 50k+ character docs I would like to transform to speech

The model is unable to process more than 25 seconds of audio reliably. You will need to extract text from an input file, split it into manageable chunks, convert each chunk to speech independently, and then concatenate them into a single audio file.

This feature may be added to the Dia TTS Server UI in the next few days. However, I think the generated audio chunks will have inconsistent voices. For now I see no way to control the voice output.

Here is a good video demonstrating the issue and a potential solution:
https://www.youtube.com/watch?v=tje3uAZqgV0

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment