Spaces:

tuankg1028
/

vietvoices

Sleeping

App Files Files Community

vietvoices / runpod_serverless /README.md

tuankg1028

Enables streaming TTS and improves API

7f7006b 12 days ago

preview code

raw

history blame contribute delete

2.43 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

VietVoices RunPod Serverless Deployment

This folder contains all the necessary files to deploy VietVoices TTS on RunPod Serverless.

Setup Instructions

1. Prerequisites

RunPod account with API key
Docker Hub account
Hugging Face token (for model access)

2. Environment Variables

Set these environment variables:

export RUNPOD_API_KEY="your_runpod_api_key"
export HUGGINGFACEHUB_API_TOKEN="your_hf_token"

3. Build and Push Docker Image

# Make the script executable
chmod +x build_and_push.sh

# Update DOCKER_USERNAME in the script
# Then run:
./build_and_push.sh

4. Deploy to RunPod

python deploy.py

API Usage

Non-Streaming Mode (Default)

Request Format:

{
  "input": {
    "ref_audio": "https://s3.amazonaws.com/path/to/audio.wav",
    "gen_text": "Text to convert to speech",
    "speed": 1.0
  }
}

Response Format:

{
  "audio_base64": "base64_encoded_output_audio",
  "sample_rate": 24000,
  "spectrogram_base64": "base64_encoded_spectrogram",
  "status": "success"
}

Streaming Mode

Enable streaming by setting "stream": true to receive audio chunks as they're generated:

Request Format:

{
  "input": {
    "ref_audio": "https://s3.amazonaws.com/path/to/audio.wav",
    "gen_text": "Text to convert to speech",
    "speed": 1.0,
    "stream": true
  }
}

Streaming Response (each chunk):

{
  "chunk_index": 0,
  "total_chunks": 5,
  "progress": 20.0,
  "audio_chunk_base64": "base64_wav_chunk",
  "sample_rate": 24000,
  "status": "processing",
  "text_batch": "Text portion for this chunk"
}

How to use streaming:

Submit job to /run endpoint (NOT /runsync) with "stream": true
Get the job_id from the response
Poll /stream/{job_id} endpoint every 1-2 seconds
Process chunks from stream_data["stream"] array
Stop when status == "COMPLETED"

Important: Streaming requires the async /run endpoint. The /runsync endpoint does not support streaming.

Error Response

{
  "error": "Error message"
}

Cost Optimization

The endpoint uses cold starts (workers_min: 0) to minimize costs
Scales up to 1 worker when requests come in
Uses RTX 3090 GPUs for optimal performance/cost ratio

Monitoring

Check your RunPod dashboard for:

Request logs
Performance metrics
Cost tracking
Error monitoring