---
title: CSM-1B Gradio Demo
emoji: 🎙️
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
---

# CSM-1B Text-to-Speech Demo

This application uses the CSM-1B (Collaborative Speech Model) to convert text to high-quality speech.

## Features

- **Simple Audio Generation**: Convert text to speech with options for speaker ID, duration, temperature, and top-k.
- **Audio Generation with Context**: Provide audio clips and text as context to help the model generate more appropriate speech.
- **GPU Optimization**: Uses Hugging Face Spaces' ZeroGPU to optimize GPU usage.

## Installation and Configuration

### Access Requirements

To use the CSM-1B model, you need access to the following models on Hugging Face:

- [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)
- [sesame/csm-1b](https://huggingface.co/sesame/csm-1b)

### Hugging Face Token Configuration

1. Create a Hugging Face account if you don't have one.
2. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens) to create a token.
3. Request access to the models if needed.
4. Set the `HF_TOKEN` environment variable with your token:
   ```bash
   export HF_TOKEN=your_token_here
   ```
5. Or you can enter your token directly in the "Configuration" tab of the application.

### Installation

```bash
git clone https://github.com/yourusername/csm-1b-gradio.git
cd csm-1b-gradio
pip install -r requirements.txt
```

## How to Use

1. Start the application:
   ```bash
   python app.py
   ```
2. Open a web browser and go to the displayed address (usually http://127.0.0.1:7860).
3. Enter the text you want to convert to speech.
4. Choose a speaker ID (from 0-10).
5. Adjust parameters like maximum duration, temperature, and top-k.
6. Click the "Generate Audio" button to create speech.

## About the Model

CSM-1B is an advanced text-to-speech model developed by Sesame AI Labs. This model can generate natural speech from text with various voices.

## ZeroGPU

This application uses Hugging Face Spaces' ZeroGPU to optimize GPU usage. ZeroGPU helps free up GPU memory when not in use, saving resources and improving performance.

```python
import spaces

@spaces.GPU
def my_gpu_function():
    # This function will only use GPU when called
    # and release GPU after completion
    pass
```

When deployed on Hugging Face Spaces, ZeroGPU will automatically manage GPU usage, making the application more efficient.

## Notes

- This model uses watermarking to mark audio generated by AI.
- Audio generation time depends on text length and hardware configuration.
- You need access to the CSM-1B model on Hugging Face to use this application.

## Deployment on Hugging Face Spaces

To deploy this application on Hugging Face Spaces:

1. Create a new Space on Hugging Face with Gradio SDK.
2. Upload all project files.
3. In the Space settings, add the `HF_TOKEN` environment variable with your token.
4. Choose appropriate hardware configuration (GPU recommended).

## Resources

- [GitHub Repository](https://github.com/SesameAILabs/csm-1b)
- [Hugging Face Model](https://huggingface.co/sesame/csm-1b)
- [Hugging Face Space Demo](https://huggingface.co/spaces/sesame/csm-1b)
- [Hugging Face Spaces ZeroGPU](https://huggingface.co/docs/hub/spaces-sdks-docker-zero-gpu)