tuankg1028's picture
Enables streaming TTS and improves API
7f7006b

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

VietVoices RunPod Serverless Deployment

This folder contains all the necessary files to deploy VietVoices TTS on RunPod Serverless.

Setup Instructions

1. Prerequisites

  • RunPod account with API key
  • Docker Hub account
  • Hugging Face token (for model access)

2. Environment Variables

Set these environment variables:

export RUNPOD_API_KEY="your_runpod_api_key"
export HUGGINGFACEHUB_API_TOKEN="your_hf_token"

3. Build and Push Docker Image

# Make the script executable
chmod +x build_and_push.sh

# Update DOCKER_USERNAME in the script
# Then run:
./build_and_push.sh

4. Deploy to RunPod

python deploy.py

API Usage

Non-Streaming Mode (Default)

Request Format:

{
  "input": {
    "ref_audio": "https://s3.amazonaws.com/path/to/audio.wav",
    "gen_text": "Text to convert to speech",
    "speed": 1.0
  }
}

Response Format:

{
  "audio_base64": "base64_encoded_output_audio",
  "sample_rate": 24000,
  "spectrogram_base64": "base64_encoded_spectrogram",
  "status": "success"
}

Streaming Mode

Enable streaming by setting "stream": true to receive audio chunks as they're generated:

Request Format:

{
  "input": {
    "ref_audio": "https://s3.amazonaws.com/path/to/audio.wav",
    "gen_text": "Text to convert to speech",
    "speed": 1.0,
    "stream": true
  }
}

Streaming Response (each chunk):

{
  "chunk_index": 0,
  "total_chunks": 5,
  "progress": 20.0,
  "audio_chunk_base64": "base64_wav_chunk",
  "sample_rate": 24000,
  "status": "processing",
  "text_batch": "Text portion for this chunk"
}

How to use streaming:

  1. Submit job to /run endpoint (NOT /runsync) with "stream": true
  2. Get the job_id from the response
  3. Poll /stream/{job_id} endpoint every 1-2 seconds
  4. Process chunks from stream_data["stream"] array
  5. Stop when status == "COMPLETED"

Important: Streaming requires the async /run endpoint. The /runsync endpoint does not support streaming.

Error Response

{
  "error": "Error message"
}

Cost Optimization

  • The endpoint uses cold starts (workers_min: 0) to minimize costs
  • Scales up to 1 worker when requests come in
  • Uses RTX 3090 GPUs for optimal performance/cost ratio

Monitoring

Check your RunPod dashboard for:

  • Request logs
  • Performance metrics
  • Cost tracking
  • Error monitoring