ai-building-blocks / README.md
LiKenun's picture
Add required `espeak`
65e848c
|
raw
history blame
7.58 kB
metadata
title: AI Building Blocks
emoji: πŸ‘€
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: wtfpl
short_description: A gallery of building blocks for building AI applications

AI Building Blocks

A gallery of AI building blocks for building AI applications, featuring a Gradio web interface with multiple tabs for different AI tasks.

Features

This application provides the following AI building blocks:

  • Text-to-image Generation: Generate images from text prompts using Hugging Face Inference API
  • Image-to-text (Image Captioning): Generate text descriptions of images using BLIP models
  • Image Classification: Classify recyclable items using Trash-Net model
  • Text-to-speech (TTS): Convert text to speech audio
  • Automatic Speech Recognition (ASR): Transcribe audio to text using Whisper models
  • Chatbot: Have conversations with AI chatbots supporting both modern chat models and seq2seq models

Prerequisites

  • Python 3.8 or higher
  • PyTorch with hardware acceleration (strongly recommended - see PyTorch Installation)
  • CUDA-capable GPU (optional, but recommended for better performance)

Installation

  1. Clone this repository:

    git clone <repository-url>
    cd ai-building-blocks
    
  2. Create a virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    
  3. Install system dependencies (required for text-to-speech):

    # On Ubuntu/Debian:
    sudo apt-get update && sudo apt-get install -y espeak-ng
    
    # On macOS:
    brew install espeak-ng
    
    # On Fedora/RHEL:
    sudo dnf install espeak-ng
    
  4. Install PyTorch with CUDA support (see PyTorch Installation below).

  5. Install the remaining dependencies:

    pip install -r requirements.txt
    

PyTorch Installation

PyTorch is not included in requirements.txt because installation varies based on your hardware and operating system. It is strongly recommended to install PyTorch with hardware acceleration support for optimal performance.

For official installation instructions with CUDA support, please visit:

Select your platform, package manager, Python version, and CUDA version to get the appropriate installation command. For example:

  • CUDA 12.1 (recommended for modern NVIDIA GPUs):

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
    
  • CUDA 11.8:

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
    
  • CPU only (not recommended for production):

    pip install torch torchvision torchaudio
    

Configuration

Create a .env file in the project root directory with the following environment variables:

Required Environment Variables

# Hugging Face API Token (required for Inference API access)
# Get your token from: https://huggingface.co/settings/tokens
HF_TOKEN=your_huggingface_token_here

# Model IDs for each building block
TEXT_TO_IMAGE_MODEL=model_id_for_text_to_image
IMAGE_TO_TEXT_MODEL=model_id_for_image_captioning
IMAGE_CLASSIFICATION_MODEL=model_id_for_image_classification
TEXT_TO_SPEECH_MODEL=model_id_for_text_to_speech
AUDIO_TRANSCRIPTION_MODEL=model_id_for_speech_recognition
CHAT_MODEL=model_id_for_chatbot

Optional Environment Variables

# Request timeout in seconds (default: 45)
REQUEST_TIMEOUT=45

Example .env File

HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Example model IDs (adjust based on your needs)
TEXT_TO_IMAGE_MODEL=black-forest-labs/FLUX.1-dev
IMAGE_CLASSIFICATION_MODEL=prithivMLmods/Trash-Net
IMAGE_TO_TEXT_MODEL=Salesforce/blip-image-captioning-large
TEXT_TO_SPEECH_MODEL=kakao-enterprise/vits-ljs
AUDIO_TRANSCRIPTION_MODEL=openai/whisper-large-v3
CHAT_MODEL=Qwen/Qwen2.5-1.5B-Instruct

REQUEST_TIMEOUT=45

Note: .env should already be included in the .gitignore file. Make sure to never git add --force -- it to prevent committing sensitive tokens.

Running the Application

  1. Activate your virtual environment (if not already activated):

    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    
  2. Run the application:

    python app.py
    
  3. Open your web browser and navigate to the URL shown in the terminal (typically http://127.0.0.1:7860).

  4. The Gradio interface will display multiple tabs, each corresponding to a different AI building block.

Project Structure

ai-building-blocks/
β”œβ”€β”€ app.py                              # Main application entry point
β”œβ”€β”€ text_to_image.py                    # Text-to-image generation module
β”œβ”€β”€ image_to_text.py                    # Image captioning module
β”œβ”€β”€ image_classification.py             # Image classification module
β”œβ”€β”€ text_to_speech.py                   # Text-to-speech module
β”œβ”€β”€ automatic_speech_recognition.py     # Speech recognition module
β”œβ”€β”€ chatbot.py                          # Chatbot module
β”œβ”€β”€ utils.py                            # Utility functions
β”œβ”€β”€ requirements.txt                    # Python dependencies
β”œβ”€β”€ packages.txt                        # System dependencies (for Hugging Face Spaces)
β”œβ”€β”€ .env                                # Environment variables (create this)
└── README.md                           # This file

Hardware Acceleration

This application is designed to leverage hardware acceleration when available:

  • NVIDIA CUDA: Automatically detected and used if available
  • AMD ROCm: Supported via CUDA compatibility
  • Intel XPU: Automatically detected if available
  • Apple Silicon (MPS): Automatically detected and used on Apple devices
  • CPU: Falls back to CPU if no GPU acceleration is available

The application automatically selects the best available device. For optimal performance, especially with local models (image-to-text, text-to-speech, chatbot), a CUDA-capable GPU is strongly recommended. This is untested on other hardware. πŸ˜‰

Troubleshooting

PyTorch Not Detecting GPU

If PyTorch is not detecting your GPU:

  1. Verify CUDA is installed: nvidia-smi
  2. Ensure PyTorch was installed with CUDA support (see PyTorch Installation)
  3. Check PyTorch CUDA availability: python -c "import torch; print(torch.cuda.is_available())"

Missing Environment Variables

Ensure all required environment variables are set in your .env file. Missing variables will cause the application to fail when trying to use the corresponding feature.

espeak Not Installed (Text-to-Speech)

If you encounter a RuntimeError: espeak not installed on your system error:

  1. Install espeak-ng using your system package manager (see Installation step 3).
  2. On Hugging Face Spaces, ensure packages.txt exists with espeak-ng listed (this file is automatically used by Spaces).
  3. Verify installation: espeak --version or espeak-ng --version

Model Loading Errors

If you encounter errors loading models:

  1. Verify your HF_TOKEN is valid and has access to the models. Some models are gated.
  2. Check that model IDs in your .env file are correct.
  3. Ensure you have sufficient disk space for model downloads.
  4. For local models, ensure you have sufficient RAM or VRAM.