metadata

title: CSM-1B Gradio Demo
emoji: 🎙️
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false

CSM-1B Text-to-Speech Demo

This application uses the CSM-1B (Collaborative Speech Model) to convert text to high-quality speech.

Features

Simple Audio Generation: Convert text to speech with options for speaker ID, duration, temperature, and top-k.
Audio Generation with Context: Provide audio clips and text as context to help the model generate more appropriate speech.
GPU Optimization: Uses Hugging Face Spaces' ZeroGPU to optimize GPU usage.

Installation and Configuration

Access Requirements

To use the CSM-1B model, you need access to the following models on Hugging Face:

Hugging Face Token Configuration

Create a Hugging Face account if you don't have one.
Go to Hugging Face Settings to create a token.
Request access to the models if needed.
Set the HF_TOKEN environment variable with your token:
```
export HF_TOKEN=your_token_here
```
Or you can enter your token directly in the "Configuration" tab of the application.

Installation

git clone https://github.com/yourusername/csm-1b-gradio.git
cd csm-1b-gradio
pip install -r requirements.txt

How to Use

Start the application:
```
python app.py
```
Open a web browser and go to the displayed address (usually http://127.0.0.1:7860).
Enter the text you want to convert to speech.
Choose a speaker ID (from 0-10).
Adjust parameters like maximum duration, temperature, and top-k.
Click the "Generate Audio" button to create speech.

About the Model

CSM-1B is an advanced text-to-speech model developed by Sesame AI Labs. This model can generate natural speech from text with various voices.

ZeroGPU

This application uses Hugging Face Spaces' ZeroGPU to optimize GPU usage. ZeroGPU helps free up GPU memory when not in use, saving resources and improving performance.

import spaces

@spaces.GPU
def my_gpu_function():
    # This function will only use GPU when called
    # and release GPU after completion
    pass

When deployed on Hugging Face Spaces, ZeroGPU will automatically manage GPU usage, making the application more efficient.

Notes

This model uses watermarking to mark audio generated by AI.
Audio generation time depends on text length and hardware configuration.
You need access to the CSM-1B model on Hugging Face to use this application.

Deployment on Hugging Face Spaces

To deploy this application on Hugging Face Spaces:

Create a new Space on Hugging Face with Gradio SDK.
Upload all project files.
In the Space settings, add the HF_TOKEN environment variable with your token.
Choose appropriate hardware configuration (GPU recommended).

Spaces:

alethanhson
/

csm-1b-gradio-v2

Running