# VietVoices RunPod Serverless Deployment This folder contains all the necessary files to deploy VietVoices TTS on RunPod Serverless. ## Setup Instructions ### 1. Prerequisites - RunPod account with API key - Docker Hub account - Hugging Face token (for model access) ### 2. Environment Variables Set these environment variables: ```bash export RUNPOD_API_KEY="your_runpod_api_key" export HUGGINGFACEHUB_API_TOKEN="your_hf_token" ``` ### 3. Build and Push Docker Image ```bash # Make the script executable chmod +x build_and_push.sh # Update DOCKER_USERNAME in the script # Then run: ./build_and_push.sh ``` ### 4. Deploy to RunPod ```bash python deploy.py ``` ## API Usage ### Non-Streaming Mode (Default) **Request Format:** ```json { "input": { "ref_audio": "https://s3.amazonaws.com/path/to/audio.wav", "gen_text": "Text to convert to speech", "speed": 1.0 } } ``` **Response Format:** ```json { "audio_base64": "base64_encoded_output_audio", "sample_rate": 24000, "spectrogram_base64": "base64_encoded_spectrogram", "status": "success" } ``` ### Streaming Mode Enable streaming by setting `"stream": true` to receive audio chunks as they're generated: **Request Format:** ```json { "input": { "ref_audio": "https://s3.amazonaws.com/path/to/audio.wav", "gen_text": "Text to convert to speech", "speed": 1.0, "stream": true } } ``` **Streaming Response (each chunk):** ```json { "chunk_index": 0, "total_chunks": 5, "progress": 20.0, "audio_chunk_base64": "base64_wav_chunk", "sample_rate": 24000, "status": "processing", "text_batch": "Text portion for this chunk" } ``` **How to use streaming:** 1. Submit job to `/run` endpoint (NOT `/runsync`) with `"stream": true` 2. Get the `job_id` from the response 3. Poll `/stream/{job_id}` endpoint every 1-2 seconds 4. Process chunks from `stream_data["stream"]` array 5. Stop when `status == "COMPLETED"` **Important:** Streaming requires the async `/run` endpoint. The `/runsync` endpoint does not support streaming. ### Error Response ```json { "error": "Error message" } ``` ## Cost Optimization - The endpoint uses cold starts (workers_min: 0) to minimize costs - Scales up to 1 worker when requests come in - Uses RTX 3090 GPUs for optimal performance/cost ratio ## Monitoring Check your RunPod dashboard for: - Request logs - Performance metrics - Cost tracking - Error monitoring