--- title: CSM-1B Gradio Demo emoji: 🎙️ colorFrom: indigo colorTo: green sdk: gradio sdk_version: 4.44.1 app_file: app.py pinned: false --- # CSM-1B Text-to-Speech Demo This application uses the CSM-1B (Collaborative Speech Model) to convert text to high-quality speech. ## Features - **Simple Audio Generation**: Convert text to speech with options for speaker ID, duration, temperature, and top-k. - **Audio Generation with Context**: Provide audio clips and text as context to help the model generate more appropriate speech. - **GPU Optimization**: Uses Hugging Face Spaces' ZeroGPU to optimize GPU usage. ## Installation and Configuration ### Access Requirements To use the CSM-1B model, you need access to the following models on Hugging Face: - [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) - [sesame/csm-1b](https://huggingface.co/sesame/csm-1b) ### Hugging Face Token Configuration 1. Create a Hugging Face account if you don't have one. 2. Go to [Hugging Face Settings](https://huggingface.co/settings/tokens) to create a token. 3. Request access to the models if needed. 4. Set the `HF_TOKEN` environment variable with your token: ```bash export HF_TOKEN=your_token_here ``` 5. Or you can enter your token directly in the "Configuration" tab of the application. ### Installation ```bash git clone https://github.com/yourusername/csm-1b-gradio.git cd csm-1b-gradio pip install -r requirements.txt ``` ## How to Use 1. Start the application: ```bash python app.py ``` 2. Open a web browser and go to the displayed address (usually http://127.0.0.1:7860). 3. Enter the text you want to convert to speech. 4. Choose a speaker ID (from 0-10). 5. Adjust parameters like maximum duration, temperature, and top-k. 6. Click the "Generate Audio" button to create speech. ## About the Model CSM-1B is an advanced text-to-speech model developed by Sesame AI Labs. This model can generate natural speech from text with various voices. ## ZeroGPU This application uses Hugging Face Spaces' ZeroGPU to optimize GPU usage. ZeroGPU helps free up GPU memory when not in use, saving resources and improving performance. ```python import spaces @spaces.GPU def my_gpu_function(): # This function will only use GPU when called # and release GPU after completion pass ``` When deployed on Hugging Face Spaces, ZeroGPU will automatically manage GPU usage, making the application more efficient. ## Notes - This model uses watermarking to mark audio generated by AI. - Audio generation time depends on text length and hardware configuration. - You need access to the CSM-1B model on Hugging Face to use this application. ## Deployment on Hugging Face Spaces To deploy this application on Hugging Face Spaces: 1. Create a new Space on Hugging Face with Gradio SDK. 2. Upload all project files. 3. In the Space settings, add the `HF_TOKEN` environment variable with your token. 4. Choose appropriate hardware configuration (GPU recommended). ## Resources - [GitHub Repository](https://github.com/SesameAILabs/csm-1b) - [Hugging Face Model](https://huggingface.co/sesame/csm-1b) - [Hugging Face Space Demo](https://huggingface.co/spaces/sesame/csm-1b) - [Hugging Face Spaces ZeroGPU](https://huggingface.co/docs/hub/spaces-sdks-docker-zero-gpu)