Spaces:

jbilcke-hf
/

ai-comic-factory

Running

App Files Files Community

1191

ai-comic-factory / CLAUDE.md

Julian Bilcke

Add CLAUDE.md

dbda359 2 months ago

preview code

raw

history blame contribute delete

9.4 kB

AI Comic Factory - Project Documentation

Project Overview

AI Comic Factory is a Next.js application that generates AI-powered comic strips using Large Language Models (LLMs) and image generation APIs. Users input a prompt, select a comic style, and the system generates a complete comic with panels, dialog, and artwork.

Key Features:

Generate complete comics from a single text prompt
Multiple comic art styles and fonts
Support for multiple LLM providers (OpenAI, Anthropic, Groq, Hugging Face)
Multiple image generation engines (SDXL, OpenAI DALL-E, Replicate)
Interactive comic editor with speech bubbles and captions
Export to CLAP format (Cinematic Language and Audio Protocol)
Community sharing features (optional)
OAuth integration with Hugging Face

Technology Stack

Frontend:

Next.js 14.2.7 with App Router
React 18.3.1 with TypeScript 5.4.5
Tailwind CSS 3.4.1 with custom comic fonts
shadcn/ui component library (Radix UI primitives)
Zustand for state management
React Konva for canvas-based comic editing
Framer Motion alternatives via Tailwind animations

Backend/API:

Next.js Server Actions (9 server functions identified)
Multiple LLM integrations: OpenAI, Anthropic Claude, Groq, Hugging Face
Multiple rendering engines: SDXL, Replicate, VideoChain API, OpenAI DALL-E
Image processing with Sharp, HTML2Canvas
Docker containerization

Key Dependencies:

@aitube/clap - CLAP format support for multimedia projects
@anthropic-ai/sdk - Claude AI integration
@huggingface/inference - Hugging Face model access
groq-sdk - Groq API integration
openai - OpenAI API integration
replicate - Replicate.com API integration
Custom font handling with 13 different comic fonts

Project Structure

src/
├── app/                      # Next.js app router
│   ├── engine/              # Core business logic
│   │   ├── presets.ts       # Comic style presets (678 lines, 4 main presets)
│   │   ├── render.ts        # Image generation engine
│   │   ├── caption.ts       # Caption generation
│   │   └── censorship.ts    # Content filtering
│   ├── interface/           # UI components (22 directories)
│   │   ├── page/           # Comic page layout
│   │   ├── panel/          # Individual comic panels
│   │   ├── bottom-bar/     # Controls and actions
│   │   ├── settings-dialog/ # Configuration UI
│   │   └── ...
│   ├── queries/            # Server-side data fetching (13 files)
│   │   ├── predict.ts      # LLM prediction orchestration
│   │   ├── predictNextPanels.ts # Panel generation logic
│   │   ├── predictWith*.ts # Provider-specific implementations
│   │   └── ...
│   ├── store/              # Zustand state management
│   │   └── index.ts        # Main app state (21KB)
│   ├── layouts/            # Comic layout definitions
│   └── main.tsx           # Main application component
├── components/
│   ├── ui/                 # shadcn/ui components (27 components)
│   └── icons/             # Custom icons
├── lib/                    # Utility functions (49 files)
│   ├── fonts.ts           # Comic font definitions
│   ├── bubble/            # Speech bubble utilities
│   └── [various utilities for image processing, parsing, etc.]
├── fonts/                  # 13 custom comic fonts
└── types.ts               # TypeScript type definitions (217 lines)

Development Commands

# Development
npm run dev          # Start development server
npm run build        # Production build
npm run start        # Start production server
npm run lint         # ESLint checking

# Node version
nvm use             # Uses Node v20.17.0 (specified in .nvmrc)

Environment Configuration

The application requires extensive environment configuration in .env.local:

Core Engines:

LLM_ENGINE: "INFERENCE_API" | "INFERENCE_ENDPOINT" | "OPENAI" | "GROQ" | "ANTHROPIC"
RENDERING_ENGINE: "INFERENCE_API" | "INFERENCE_ENDPOINT" | "REPLICATE" | "VIDEOCHAIN" | "OPENAI"

Authentication (configure only what you use):

AUTH_HF_API_TOKEN - Hugging Face API token
AUTH_OPENAI_API_KEY - OpenAI API key
AUTH_GROQ_API_KEY - Groq API key
AUTH_ANTHROPIC_API_KEY - Anthropic/Claude API key
AUTH_REPLICATE_API_TOKEN - Replicate.com token
AUTH_VIDEOCHAIN_API_TOKEN - VideoChain API token

LLM Configuration:

LLM_HF_INFERENCE_API_MODEL - Default: "HuggingFaceH4/zephyr-7b-beta"
LLM_OPENAI_API_MODEL - Default: "gpt-4-turbo"
LLM_GROQ_API_MODEL - Default: "mixtral-8x7b-32768"
LLM_ANTHROPIC_API_MODEL - Default: "claude-3-opus-20240229"

Rendering Configuration:

RENDERING_HF_INFERENCE_API_BASE_MODEL - Default: "stabilityai/stable-diffusion-xl-base-1.0"
RENDERING_REPLICATE_API_MODEL - Default: "stabilityai/sdxl"
MAX_NB_PAGES - Default: 6

Architecture Patterns

State Management:

Zustand store with typed selectors and actions
Complex state includes: panels, speeches, captions, renderedScenes, layouts
Real-time panel generation status tracking

LLM Integration Pattern:

Abstracted provider interface through predict() function
Provider-specific implementations in separate files
Standardized prompt templates and response parsing
Support for multiple prompt formats (Zephyr, Llama, etc.)

Image Generation Flow:

User provides prompt + selects preset
LLM generates panel descriptions, speech, and captions
Each panel description is sent to rendering engine
Images are generated and cached
User can edit speech bubbles and captions
Final comic can be exported as image or CLAP file

Server Actions Architecture:

9 server actions for LLM predictions and rendering
Clean separation between UI and server logic
Error handling and fallbacks for API failures

Comic Preset System:

4 main preset categories with 678 lines of configuration
Each preset defines: art style, color scheme, font, LLM prompts, image prompts
Extensible system for adding new comic styles

Font System:

13 custom comic fonts loaded as local fonts
Includes both Google Fonts (Indie Flower, The Girl Next Door) and custom fonts
Proper CSS variable integration for consistent typography

Key Business Logic

Panel Generation (predictNextPanels):

Generates multiple comic panels from a single prompt
Handles continuation of existing stories
Parses LLM responses into structured panel data (instructions, speech, captions)
Error recovery and retry logic

Rendering Pipeline (render.ts):

Multi-provider image generation (Replicate, HF, OpenAI, VideoChain)
Automatic fallbacks between providers
Image caching and optimization
Support for different aspect ratios and resolutions

State Persistence:

LocalStorage integration for user settings
CLAP file format support for project serialization
OAuth state management with Hugging Face

Development Patterns

Component Organization:

Feature-based component structure in app/interface/
Reusable UI components in components/ui/
Custom hooks in lib/ for complex logic

Type Safety:

Comprehensive TypeScript definitions in types.ts
Strict typing for LLM engines, rendering engines, and data flows
Generic interfaces for extensible provider support

Error Handling:

Graceful degradation for API failures
User feedback through toast notifications
Fallback content for missing images/data

Performance Considerations:

Image optimization with Sharp
Lazy loading of comic panels
Efficient state updates with Zustand
Canvas-based rendering for complex layouts

Testing & Quality

Linting: ESLint with Next.js configuration
No test files found - this is an area for improvement
Type checking: Strict TypeScript configuration
Docker: Production containerization available

Deployment

Designed for Hugging Face Spaces deployment
Docker containerization with Node.js Alpine
Standalone Next.js output for containerized deployment
Environment-based configuration for different deployment targets

Community & Contributions

Open source project on Hugging Face
Community contributions documented in CONTRIBUTORS.md
Optional community sharing features
OAuth integration for user management

Development Notes

No API routes found - uses Server Actions exclusively
Canvas-based editing with React Konva for interactive panels
Multi-provider architecture allows switching between AI services
Extensive font library for authentic comic typography
CLAP format integration for multimedia project export
Rate limiting configurable for production usage

Quick Start for Developers

Copy .env to .env.local and configure your API keys
Choose your LLM_ENGINE and RENDERING_ENGINE
Install dependencies: npm install
Run development server: npm run dev
The app will guide you through first-time setup

Most common development setup:

LLM_ENGINE: "OPENAI" with OpenAI API key
RENDERING_ENGINE: "REPLICATE" with Replicate token
This provides reliable, high-quality results for testing