Spaces:

LiKenun
/

ai-building-blocks

Sleeping

App Files Files Community

ai-building-blocks / README.md

LiKenun

Add required `espeak`

65e848c 2 months ago

preview code

raw

history blame

7.58 kB

	---
	title: AI Building Blocks
	emoji: 👀
	colorFrom: purple
	colorTo: blue
	sdk: gradio
	sdk_version: 5.49.1
	app_file: app.py
	pinned: false
	license: wtfpl
	short_description: A gallery of building blocks for building AI applications
	---

	# AI Building Blocks

	A gallery of AI building blocks for building AI applications, featuring a Gradio web interface with multiple tabs for different AI tasks.

	## Features

	This application provides the following AI building blocks:

	- Text-to-image Generation: Generate images from text prompts using Hugging Face Inference API
	- Image-to-text (Image Captioning): Generate text descriptions of images using BLIP models
	- Image Classification: Classify recyclable items using Trash-Net model
	- Text-to-speech (TTS): Convert text to speech audio
	- Automatic Speech Recognition (ASR): Transcribe audio to text using Whisper models
	- Chatbot: Have conversations with AI chatbots supporting both modern chat models and seq2seq models

	## Prerequisites

	- Python 3.8 or higher
	- PyTorch with hardware acceleration (strongly recommended - see [PyTorch Installation](#pytorch-installation))
	- CUDA-capable GPU (optional, but recommended for better performance)

	## Installation

	1. Clone this repository:
	```bash
	git clone <repository-url>
	cd ai-building-blocks
	```

	2. Create a virtual environment:
	```bash
	python -m venv .venv
	source .venv/bin/activate # On Windows: .venv\Scripts\activate
	```

	3. Install system dependencies (required for text-to-speech):
	```bash
	# On Ubuntu/Debian:
	sudo apt-get update && sudo apt-get install -y espeak-ng

	# On macOS:
	brew install espeak-ng

	# On Fedora/RHEL:
	sudo dnf install espeak-ng
	```

	4. Install PyTorch with CUDA support (see [PyTorch Installation](#pytorch-installation) below).

	5. Install the remaining dependencies:
	```bash
	pip install -r requirements.txt
	```

	## PyTorch Installation

	PyTorch is not included in `requirements.txt` because installation varies based on your hardware and operating system. It is strongly recommended to install PyTorch with hardware acceleration support for optimal performance.

	For official installation instructions with CUDA support, please visit:
	- Official PyTorch Installation Guide: https://pytorch.org/get-started/locally/

	Select your platform, package manager, Python version, and CUDA version to get the appropriate installation command. For example:

	- CUDA 12.1 (recommended for modern NVIDIA GPUs):
	```bash
	pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
	```

	- CUDA 11.8:
	```bash
	pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
	```

	- CPU only (not recommended for production):
	```bash
	pip install torch torchvision torchaudio
	```

	## Configuration

	Create a `.env` file in the project root directory with the following environment variables:

	### Required Environment Variables

	```env
	# Hugging Face API Token (required for Inference API access)
	# Get your token from: https://huggingface.co/settings/tokens
	HF_TOKEN=your_huggingface_token_here

	# Model IDs for each building block
	TEXT_TO_IMAGE_MODEL=model_id_for_text_to_image
	IMAGE_TO_TEXT_MODEL=model_id_for_image_captioning
	IMAGE_CLASSIFICATION_MODEL=model_id_for_image_classification
	TEXT_TO_SPEECH_MODEL=model_id_for_text_to_speech
	AUDIO_TRANSCRIPTION_MODEL=model_id_for_speech_recognition
	CHAT_MODEL=model_id_for_chatbot
	```

	### Optional Environment Variables

	```env
	# Request timeout in seconds (default: 45)
	REQUEST_TIMEOUT=45
	```

	### Example `.env` File

	```env
	HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

	# Example model IDs (adjust based on your needs)
	TEXT_TO_IMAGE_MODEL=black-forest-labs/FLUX.1-dev
	IMAGE_CLASSIFICATION_MODEL=prithivMLmods/Trash-Net
	IMAGE_TO_TEXT_MODEL=Salesforce/blip-image-captioning-large
	TEXT_TO_SPEECH_MODEL=kakao-enterprise/vits-ljs
	AUDIO_TRANSCRIPTION_MODEL=openai/whisper-large-v3
	CHAT_MODEL=Qwen/Qwen2.5-1.5B-Instruct

	REQUEST_TIMEOUT=45
	```

	Note: `.env` should already be included in the `.gitignore` file. Make sure to never `git add --force --` it to prevent committing sensitive tokens.

	## Running the Application

	1. Activate your virtual environment (if not already activated):
	```bash
	source .venv/bin/activate # On Windows: .venv\Scripts\activate
	```

	2. Run the application:
	```bash
	python app.py
	```

	3. Open your web browser and navigate to the URL shown in the terminal (typically `http://127.0.0.1:7860`).

	4. The Gradio interface will display multiple tabs, each corresponding to a different AI building block.

	## Project Structure

	```
	ai-building-blocks/
	├── app.py # Main application entry point
	├── text_to_image.py # Text-to-image generation module
	├── image_to_text.py # Image captioning module
	├── image_classification.py # Image classification module
	├── text_to_speech.py # Text-to-speech module
	├── automatic_speech_recognition.py # Speech recognition module
	├── chatbot.py # Chatbot module
	├── utils.py # Utility functions
	├── requirements.txt # Python dependencies
	├── packages.txt # System dependencies (for Hugging Face Spaces)
	├── .env # Environment variables (create this)
	└── README.md # This file
	```

	## Hardware Acceleration

	This application is designed to leverage hardware acceleration when available:

	- NVIDIA CUDA: Automatically detected and used if available
	- AMD ROCm: Supported via CUDA compatibility
	- Intel XPU: Automatically detected if available
	- Apple Silicon (MPS): Automatically detected and used on Apple devices
	- CPU: Falls back to CPU if no GPU acceleration is available

	The application automatically selects the best available device. For optimal performance, especially with local models (image-to-text, text-to-speech, chatbot), a CUDA-capable GPU is strongly recommended. This is _untested_ on other hardware. 😉

	## Troubleshooting

	### PyTorch Not Detecting GPU

	If PyTorch is not detecting your GPU:

	1. Verify CUDA is installed: `nvidia-smi`
	2. Ensure PyTorch was installed with CUDA support (see [PyTorch Installation](#pytorch-installation))
	3. Check PyTorch CUDA availability: `python -c "import torch; print(torch.cuda.is_available())"`

	### Missing Environment Variables

	Ensure all required environment variables are set in your `.env` file. Missing variables will cause the application to fail when trying to use the corresponding feature.

	### espeak Not Installed (Text-to-Speech)

	If you encounter a `RuntimeError: espeak not installed on your system` error:

	1. Install `espeak-ng` using your system package manager (see [Installation](#installation) step 3).
	2. On Hugging Face Spaces, ensure `packages.txt` exists with `espeak-ng` listed (this file is automatically used by Spaces).
	3. Verify installation: `espeak --version` or `espeak-ng --version`

	### Model Loading Errors

	If you encounter errors loading models:

	1. Verify your `HF_TOKEN` is valid and has access to the models. Some models are gated.
	2. Check that model IDs in your `.env` file are correct.
	3. Ensure you have sufficient disk space for model downloads.
	4. For local models, ensure you have sufficient RAM or VRAM.