ai-building-blocks / README.md
LiKenun's picture
Add required `espeak`
65e848c
|
raw
history blame
7.58 kB
---
title: AI Building Blocks
emoji: πŸ‘€
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: wtfpl
short_description: A gallery of building blocks for building AI applications
---
# AI Building Blocks
A gallery of AI building blocks for building AI applications, featuring a Gradio web interface with multiple tabs for different AI tasks.
## Features
This application provides the following AI building blocks:
- **Text-to-image Generation**: Generate images from text prompts using Hugging Face Inference API
- **Image-to-text (Image Captioning)**: Generate text descriptions of images using BLIP models
- **Image Classification**: Classify recyclable items using Trash-Net model
- **Text-to-speech (TTS)**: Convert text to speech audio
- **Automatic Speech Recognition (ASR)**: Transcribe audio to text using Whisper models
- **Chatbot**: Have conversations with AI chatbots supporting both modern chat models and seq2seq models
## Prerequisites
- Python 3.8 or higher
- PyTorch with hardware acceleration (strongly recommended - see [PyTorch Installation](#pytorch-installation))
- CUDA-capable GPU (optional, but recommended for better performance)
## Installation
1. Clone this repository:
```bash
git clone <repository-url>
cd ai-building-blocks
```
2. Create a virtual environment:
```bash
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
```
3. Install system dependencies (required for text-to-speech):
```bash
# On Ubuntu/Debian:
sudo apt-get update && sudo apt-get install -y espeak-ng
# On macOS:
brew install espeak-ng
# On Fedora/RHEL:
sudo dnf install espeak-ng
```
4. Install PyTorch with CUDA support (see [PyTorch Installation](#pytorch-installation) below).
5. Install the remaining dependencies:
```bash
pip install -r requirements.txt
```
## PyTorch Installation
PyTorch is not included in `requirements.txt` because installation varies based on your hardware and operating system. **It is strongly recommended to install PyTorch with hardware acceleration support** for optimal performance.
For official installation instructions with CUDA support, please visit:
- **Official PyTorch Installation Guide**: https://pytorch.org/get-started/locally/
Select your platform, package manager, Python version, and CUDA version to get the appropriate installation command. For example:
- **CUDA 12.1** (recommended for modern NVIDIA GPUs):
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
```
- **CUDA 11.8**:
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
```
- **CPU only** (not recommended for production):
```bash
pip install torch torchvision torchaudio
```
## Configuration
Create a `.env` file in the project root directory with the following environment variables:
### Required Environment Variables
```env
# Hugging Face API Token (required for Inference API access)
# Get your token from: https://huggingface.co/settings/tokens
HF_TOKEN=your_huggingface_token_here
# Model IDs for each building block
TEXT_TO_IMAGE_MODEL=model_id_for_text_to_image
IMAGE_TO_TEXT_MODEL=model_id_for_image_captioning
IMAGE_CLASSIFICATION_MODEL=model_id_for_image_classification
TEXT_TO_SPEECH_MODEL=model_id_for_text_to_speech
AUDIO_TRANSCRIPTION_MODEL=model_id_for_speech_recognition
CHAT_MODEL=model_id_for_chatbot
```
### Optional Environment Variables
```env
# Request timeout in seconds (default: 45)
REQUEST_TIMEOUT=45
```
### Example `.env` File
```env
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Example model IDs (adjust based on your needs)
TEXT_TO_IMAGE_MODEL=black-forest-labs/FLUX.1-dev
IMAGE_CLASSIFICATION_MODEL=prithivMLmods/Trash-Net
IMAGE_TO_TEXT_MODEL=Salesforce/blip-image-captioning-large
TEXT_TO_SPEECH_MODEL=kakao-enterprise/vits-ljs
AUDIO_TRANSCRIPTION_MODEL=openai/whisper-large-v3
CHAT_MODEL=Qwen/Qwen2.5-1.5B-Instruct
REQUEST_TIMEOUT=45
```
**Note**: `.env` should already be included in the `.gitignore` file. Make sure to never `git add --force --` it to prevent committing sensitive tokens.
## Running the Application
1. Activate your virtual environment (if not already activated):
```bash
source .venv/bin/activate # On Windows: .venv\Scripts\activate
```
2. Run the application:
```bash
python app.py
```
3. Open your web browser and navigate to the URL shown in the terminal (typically `http://127.0.0.1:7860`).
4. The Gradio interface will display multiple tabs, each corresponding to a different AI building block.
## Project Structure
```
ai-building-blocks/
β”œβ”€β”€ app.py # Main application entry point
β”œβ”€β”€ text_to_image.py # Text-to-image generation module
β”œβ”€β”€ image_to_text.py # Image captioning module
β”œβ”€β”€ image_classification.py # Image classification module
β”œβ”€β”€ text_to_speech.py # Text-to-speech module
β”œβ”€β”€ automatic_speech_recognition.py # Speech recognition module
β”œβ”€β”€ chatbot.py # Chatbot module
β”œβ”€β”€ utils.py # Utility functions
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ packages.txt # System dependencies (for Hugging Face Spaces)
β”œβ”€β”€ .env # Environment variables (create this)
└── README.md # This file
```
## Hardware Acceleration
This application is designed to leverage hardware acceleration when available:
- **NVIDIA CUDA**: Automatically detected and used if available
- **AMD ROCm**: Supported via CUDA compatibility
- **Intel XPU**: Automatically detected if available
- **Apple Silicon (MPS)**: Automatically detected and used on Apple devices
- **CPU**: Falls back to CPU if no GPU acceleration is available
The application automatically selects the best available device. For optimal performance, especially with local models (image-to-text, text-to-speech, chatbot), a CUDA-capable GPU is strongly recommended. This is _untested_ on other hardware. πŸ˜‰
## Troubleshooting
### PyTorch Not Detecting GPU
If PyTorch is not detecting your GPU:
1. Verify CUDA is installed: `nvidia-smi`
2. Ensure PyTorch was installed with CUDA support (see [PyTorch Installation](#pytorch-installation))
3. Check PyTorch CUDA availability: `python -c "import torch; print(torch.cuda.is_available())"`
### Missing Environment Variables
Ensure all required environment variables are set in your `.env` file. Missing variables will cause the application to fail when trying to use the corresponding feature.
### espeak Not Installed (Text-to-Speech)
If you encounter a `RuntimeError: espeak not installed on your system` error:
1. Install `espeak-ng` using your system package manager (see [Installation](#installation) step 3).
2. On Hugging Face Spaces, ensure `packages.txt` exists with `espeak-ng` listed (this file is automatically used by Spaces).
3. Verify installation: `espeak --version` or `espeak-ng --version`
### Model Loading Errors
If you encounter errors loading models:
1. Verify your `HF_TOKEN` is valid and has access to the models. Some models are gated.
2. Check that model IDs in your `.env` file are correct.
3. Ensure you have sufficient disk space for model downloads.
4. For local models, ensure you have sufficient RAM or VRAM.