# VietVoices RunPod Serverless Deployment

This folder contains all the necessary files to deploy VietVoices TTS on RunPod Serverless.

## Setup Instructions

### 1. Prerequisites

- RunPod account with API key
- Docker Hub account
- Hugging Face token (for model access)

### 2. Environment Variables

Set these environment variables:

```bash
export RUNPOD_API_KEY="your_runpod_api_key"
export HUGGINGFACEHUB_API_TOKEN="your_hf_token"
```

### 3. Build and Push Docker Image

```bash
# Make the script executable
chmod +x build_and_push.sh

# Update DOCKER_USERNAME in the script
# Then run:
./build_and_push.sh
```

### 4. Deploy to RunPod

```bash
python deploy.py
```

## API Usage

### Non-Streaming Mode (Default)

**Request Format:**
```json
{
  "input": {
    "ref_audio": "https://s3.amazonaws.com/path/to/audio.wav",
    "gen_text": "Text to convert to speech",
    "speed": 1.0
  }
}
```

**Response Format:**
```json
{
  "audio_base64": "base64_encoded_output_audio",
  "sample_rate": 24000,
  "spectrogram_base64": "base64_encoded_spectrogram",
  "status": "success"
}
```

### Streaming Mode

Enable streaming by setting `"stream": true` to receive audio chunks as they're generated:

**Request Format:**
```json
{
  "input": {
    "ref_audio": "https://s3.amazonaws.com/path/to/audio.wav",
    "gen_text": "Text to convert to speech",
    "speed": 1.0,
    "stream": true
  }
}
```

**Streaming Response (each chunk):**
```json
{
  "chunk_index": 0,
  "total_chunks": 5,
  "progress": 20.0,
  "audio_chunk_base64": "base64_wav_chunk",
  "sample_rate": 24000,
  "status": "processing",
  "text_batch": "Text portion for this chunk"
}
```

**How to use streaming:**
1. Submit job to `/run` endpoint (NOT `/runsync`) with `"stream": true`
2. Get the `job_id` from the response
3. Poll `/stream/{job_id}` endpoint every 1-2 seconds
4. Process chunks from `stream_data["stream"]` array
5. Stop when `status == "COMPLETED"`

**Important:** Streaming requires the async `/run` endpoint. The `/runsync` endpoint does not support streaming.

### Error Response

```json
{
  "error": "Error message"
}
```

## Cost Optimization

- The endpoint uses cold starts (workers_min: 0) to minimize costs
- Scales up to 1 worker when requests come in
- Uses RTX 3090 GPUs for optimal performance/cost ratio

## Monitoring

Check your RunPod dashboard for:

- Request logs
- Performance metrics
- Cost tracking
- Error monitoring