Spaces:

Remsky
/

Kokoro-TTS-Zero

Running on Zero

App Files Files Community

Kokoro-TTS-Zero / README.md

Remsky's picture

Add initial implementation of Kokoro TTS Demo Space with Gradio interface and dependencies

13eb4ae 28 days ago

|

2.06 kB

metadata

title: Kokoro TTS Zero
emoji: 📊
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: A100 GPU Accelerated Inference applied to Kokoro-82M TTS

Kokoro TTS Demo Space

A Zero GPU-optimized Hugging Face Space for the Kokoro TTS model.

Overview

This Space provides a Gradio interface for the Kokoro TTS model, allowing users to:

Convert text to speech using multiple voices
Adjust speech speed
Get instant audio playback

Technical Details

Uses Zero GPU for efficient GPU resource management
Dynamically loads required modules from hexgrad/Kokoro-82M repository
Automatically downloads model and voice files from Hugging Face Hub
Implements proper GPU memory handling
Includes caching in /data/.huggingface for faster restarts

Dependencies

The Space uses modules from two repositories:

remsky/Kokoro-FastAPI: This repository (UI and Zero GPU implementation)
hexgrad/Kokoro-82M: Original model repository (core TTS functionality)

All dependencies are automatically handled:

Core modules (kokoro.py, models.py, etc.) are downloaded from hexgrad/Kokoro-82M
Model weights and voice files are cached in /data/.huggingface
System dependencies (espeak-ng) are installed via packages.txt

Environment

Python 3.10.13
PyTorch 2.2.2
Gradio 5.9.1
Zero GPU compatible

Available Voices

Adult Female voices:

af: Confident, Friendly
af_sky: You know and Love her
af_bella: Warm and Self-Assured
af_nicole: Whispered, ASMR
af_sarah: Bright and Professional
bf_emma: Pensive and Confident, British
bf_isabella: Young Professional, British

Adult Male voices:

am_adam: Deep Narrative Voice
am_michael: Trustworthy and Thoughtful
bm_george: Distinguished older voice, British
bm_lewis: Assured and Raspy, British

Notes

First generation may take longer due to model initialization
GPU is allocated only during speech generation
Model and voices are cached in /data/.huggingface for faster subsequent runs