AI Comic Factory - Project Documentation
Project Overview
AI Comic Factory is a Next.js application that generates AI-powered comic strips using Large Language Models (LLMs) and image generation APIs. Users input a prompt, select a comic style, and the system generates a complete comic with panels, dialog, and artwork.
Key Features:
- Generate complete comics from a single text prompt
- Multiple comic art styles and fonts
- Support for multiple LLM providers (OpenAI, Anthropic, Groq, Hugging Face)
- Multiple image generation engines (SDXL, OpenAI DALL-E, Replicate)
- Interactive comic editor with speech bubbles and captions
- Export to CLAP format (Cinematic Language and Audio Protocol)
- Community sharing features (optional)
- OAuth integration with Hugging Face
Technology Stack
Frontend:
- Next.js 14.2.7 with App Router
- React 18.3.1 with TypeScript 5.4.5
- Tailwind CSS 3.4.1 with custom comic fonts
- shadcn/ui component library (Radix UI primitives)
- Zustand for state management
- React Konva for canvas-based comic editing
- Framer Motion alternatives via Tailwind animations
Backend/API:
- Next.js Server Actions (9 server functions identified)
- Multiple LLM integrations: OpenAI, Anthropic Claude, Groq, Hugging Face
- Multiple rendering engines: SDXL, Replicate, VideoChain API, OpenAI DALL-E
- Image processing with Sharp, HTML2Canvas
- Docker containerization
Key Dependencies:
@aitube/clap- CLAP format support for multimedia projects@anthropic-ai/sdk- Claude AI integration@huggingface/inference- Hugging Face model accessgroq-sdk- Groq API integrationopenai- OpenAI API integrationreplicate- Replicate.com API integration- Custom font handling with 13 different comic fonts
Project Structure
src/
βββ app/ # Next.js app router
β βββ engine/ # Core business logic
β β βββ presets.ts # Comic style presets (678 lines, 4 main presets)
β β βββ render.ts # Image generation engine
β β βββ caption.ts # Caption generation
β β βββ censorship.ts # Content filtering
β βββ interface/ # UI components (22 directories)
β β βββ page/ # Comic page layout
β β βββ panel/ # Individual comic panels
β β βββ bottom-bar/ # Controls and actions
β β βββ settings-dialog/ # Configuration UI
β β βββ ...
β βββ queries/ # Server-side data fetching (13 files)
β β βββ predict.ts # LLM prediction orchestration
β β βββ predictNextPanels.ts # Panel generation logic
β β βββ predictWith*.ts # Provider-specific implementations
β β βββ ...
β βββ store/ # Zustand state management
β β βββ index.ts # Main app state (21KB)
β βββ layouts/ # Comic layout definitions
β βββ main.tsx # Main application component
βββ components/
β βββ ui/ # shadcn/ui components (27 components)
β βββ icons/ # Custom icons
βββ lib/ # Utility functions (49 files)
β βββ fonts.ts # Comic font definitions
β βββ bubble/ # Speech bubble utilities
β βββ [various utilities for image processing, parsing, etc.]
βββ fonts/ # 13 custom comic fonts
βββ types.ts # TypeScript type definitions (217 lines)
Development Commands
# Development
npm run dev # Start development server
npm run build # Production build
npm run start # Start production server
npm run lint # ESLint checking
# Node version
nvm use # Uses Node v20.17.0 (specified in .nvmrc)
Environment Configuration
The application requires extensive environment configuration in .env.local:
Core Engines:
LLM_ENGINE: "INFERENCE_API" | "INFERENCE_ENDPOINT" | "OPENAI" | "GROQ" | "ANTHROPIC"RENDERING_ENGINE: "INFERENCE_API" | "INFERENCE_ENDPOINT" | "REPLICATE" | "VIDEOCHAIN" | "OPENAI"
Authentication (configure only what you use):
AUTH_HF_API_TOKEN- Hugging Face API tokenAUTH_OPENAI_API_KEY- OpenAI API keyAUTH_GROQ_API_KEY- Groq API keyAUTH_ANTHROPIC_API_KEY- Anthropic/Claude API keyAUTH_REPLICATE_API_TOKEN- Replicate.com tokenAUTH_VIDEOCHAIN_API_TOKEN- VideoChain API token
LLM Configuration:
LLM_HF_INFERENCE_API_MODEL- Default: "HuggingFaceH4/zephyr-7b-beta"LLM_OPENAI_API_MODEL- Default: "gpt-4-turbo"LLM_GROQ_API_MODEL- Default: "mixtral-8x7b-32768"LLM_ANTHROPIC_API_MODEL- Default: "claude-3-opus-20240229"
Rendering Configuration:
RENDERING_HF_INFERENCE_API_BASE_MODEL- Default: "stabilityai/stable-diffusion-xl-base-1.0"RENDERING_REPLICATE_API_MODEL- Default: "stabilityai/sdxl"MAX_NB_PAGES- Default: 6
Architecture Patterns
State Management:
- Zustand store with typed selectors and actions
- Complex state includes: panels, speeches, captions, renderedScenes, layouts
- Real-time panel generation status tracking
LLM Integration Pattern:
- Abstracted provider interface through
predict()function - Provider-specific implementations in separate files
- Standardized prompt templates and response parsing
- Support for multiple prompt formats (Zephyr, Llama, etc.)
Image Generation Flow:
- User provides prompt + selects preset
- LLM generates panel descriptions, speech, and captions
- Each panel description is sent to rendering engine
- Images are generated and cached
- User can edit speech bubbles and captions
- Final comic can be exported as image or CLAP file
Server Actions Architecture:
- 9 server actions for LLM predictions and rendering
- Clean separation between UI and server logic
- Error handling and fallbacks for API failures
Comic Preset System:
- 4 main preset categories with 678 lines of configuration
- Each preset defines: art style, color scheme, font, LLM prompts, image prompts
- Extensible system for adding new comic styles
Font System:
- 13 custom comic fonts loaded as local fonts
- Includes both Google Fonts (Indie Flower, The Girl Next Door) and custom fonts
- Proper CSS variable integration for consistent typography
Key Business Logic
Panel Generation (predictNextPanels):
- Generates multiple comic panels from a single prompt
- Handles continuation of existing stories
- Parses LLM responses into structured panel data (instructions, speech, captions)
- Error recovery and retry logic
Rendering Pipeline (render.ts):
- Multi-provider image generation (Replicate, HF, OpenAI, VideoChain)
- Automatic fallbacks between providers
- Image caching and optimization
- Support for different aspect ratios and resolutions
State Persistence:
- LocalStorage integration for user settings
- CLAP file format support for project serialization
- OAuth state management with Hugging Face
Development Patterns
Component Organization:
- Feature-based component structure in
app/interface/ - Reusable UI components in
components/ui/ - Custom hooks in
lib/for complex logic
Type Safety:
- Comprehensive TypeScript definitions in
types.ts - Strict typing for LLM engines, rendering engines, and data flows
- Generic interfaces for extensible provider support
Error Handling:
- Graceful degradation for API failures
- User feedback through toast notifications
- Fallback content for missing images/data
Performance Considerations:
- Image optimization with Sharp
- Lazy loading of comic panels
- Efficient state updates with Zustand
- Canvas-based rendering for complex layouts
Testing & Quality
- Linting: ESLint with Next.js configuration
- No test files found - this is an area for improvement
- Type checking: Strict TypeScript configuration
- Docker: Production containerization available
Deployment
- Designed for Hugging Face Spaces deployment
- Docker containerization with Node.js Alpine
- Standalone Next.js output for containerized deployment
- Environment-based configuration for different deployment targets
Community & Contributions
- Open source project on Hugging Face
- Community contributions documented in
CONTRIBUTORS.md - Optional community sharing features
- OAuth integration for user management
Development Notes
- No API routes found - uses Server Actions exclusively
- Canvas-based editing with React Konva for interactive panels
- Multi-provider architecture allows switching between AI services
- Extensive font library for authentic comic typography
- CLAP format integration for multimedia project export
- Rate limiting configurable for production usage
Quick Start for Developers
- Copy
.envto.env.localand configure your API keys - Choose your LLM_ENGINE and RENDERING_ENGINE
- Install dependencies:
npm install - Run development server:
npm run dev - The app will guide you through first-time setup
Most common development setup:
- LLM_ENGINE: "OPENAI" with OpenAI API key
- RENDERING_ENGINE: "REPLICATE" with Replicate token
- This provides reliable, high-quality results for testing