Julian Bilcke commited on
Commit
dbda359
Β·
1 Parent(s): 375ffae

Add CLAUDE.md

Browse files
Files changed (1) hide show
  1. CLAUDE.md +239 -0
CLAUDE.md ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AI Comic Factory - Project Documentation
2
+
3
+ ## Project Overview
4
+
5
+ **AI Comic Factory** is a Next.js application that generates AI-powered comic strips using Large Language Models (LLMs) and image generation APIs. Users input a prompt, select a comic style, and the system generates a complete comic with panels, dialog, and artwork.
6
+
7
+ **Key Features:**
8
+ - Generate complete comics from a single text prompt
9
+ - Multiple comic art styles and fonts
10
+ - Support for multiple LLM providers (OpenAI, Anthropic, Groq, Hugging Face)
11
+ - Multiple image generation engines (SDXL, OpenAI DALL-E, Replicate)
12
+ - Interactive comic editor with speech bubbles and captions
13
+ - Export to CLAP format (Cinematic Language and Audio Protocol)
14
+ - Community sharing features (optional)
15
+ - OAuth integration with Hugging Face
16
+
17
+ ## Technology Stack
18
+
19
+ **Frontend:**
20
+ - Next.js 14.2.7 with App Router
21
+ - React 18.3.1 with TypeScript 5.4.5
22
+ - Tailwind CSS 3.4.1 with custom comic fonts
23
+ - shadcn/ui component library (Radix UI primitives)
24
+ - Zustand for state management
25
+ - React Konva for canvas-based comic editing
26
+ - Framer Motion alternatives via Tailwind animations
27
+
28
+ **Backend/API:**
29
+ - Next.js Server Actions (9 server functions identified)
30
+ - Multiple LLM integrations: OpenAI, Anthropic Claude, Groq, Hugging Face
31
+ - Multiple rendering engines: SDXL, Replicate, VideoChain API, OpenAI DALL-E
32
+ - Image processing with Sharp, HTML2Canvas
33
+ - Docker containerization
34
+
35
+ **Key Dependencies:**
36
+ - `@aitube/clap` - CLAP format support for multimedia projects
37
+ - `@anthropic-ai/sdk` - Claude AI integration
38
+ - `@huggingface/inference` - Hugging Face model access
39
+ - `groq-sdk` - Groq API integration
40
+ - `openai` - OpenAI API integration
41
+ - `replicate` - Replicate.com API integration
42
+ - Custom font handling with 13 different comic fonts
43
+
44
+ ## Project Structure
45
+
46
+ ```
47
+ src/
48
+ β”œβ”€β”€ app/ # Next.js app router
49
+ β”‚ β”œβ”€β”€ engine/ # Core business logic
50
+ β”‚ β”‚ β”œβ”€β”€ presets.ts # Comic style presets (678 lines, 4 main presets)
51
+ β”‚ β”‚ β”œβ”€β”€ render.ts # Image generation engine
52
+ β”‚ β”‚ β”œβ”€β”€ caption.ts # Caption generation
53
+ β”‚ β”‚ └── censorship.ts # Content filtering
54
+ β”‚ β”œβ”€β”€ interface/ # UI components (22 directories)
55
+ β”‚ β”‚ β”œβ”€β”€ page/ # Comic page layout
56
+ β”‚ β”‚ β”œβ”€β”€ panel/ # Individual comic panels
57
+ β”‚ β”‚ β”œβ”€β”€ bottom-bar/ # Controls and actions
58
+ β”‚ β”‚ β”œβ”€β”€ settings-dialog/ # Configuration UI
59
+ β”‚ β”‚ └── ...
60
+ β”‚ β”œβ”€β”€ queries/ # Server-side data fetching (13 files)
61
+ β”‚ β”‚ β”œβ”€β”€ predict.ts # LLM prediction orchestration
62
+ β”‚ β”‚ β”œβ”€β”€ predictNextPanels.ts # Panel generation logic
63
+ β”‚ β”‚ β”œβ”€β”€ predictWith*.ts # Provider-specific implementations
64
+ β”‚ β”‚ └── ...
65
+ β”‚ β”œβ”€β”€ store/ # Zustand state management
66
+ β”‚ β”‚ └── index.ts # Main app state (21KB)
67
+ β”‚ β”œβ”€β”€ layouts/ # Comic layout definitions
68
+ β”‚ └── main.tsx # Main application component
69
+ β”œβ”€β”€ components/
70
+ β”‚ β”œβ”€β”€ ui/ # shadcn/ui components (27 components)
71
+ β”‚ └── icons/ # Custom icons
72
+ β”œβ”€β”€ lib/ # Utility functions (49 files)
73
+ β”‚ β”œβ”€β”€ fonts.ts # Comic font definitions
74
+ β”‚ β”œβ”€β”€ bubble/ # Speech bubble utilities
75
+ β”‚ └── [various utilities for image processing, parsing, etc.]
76
+ β”œβ”€β”€ fonts/ # 13 custom comic fonts
77
+ └── types.ts # TypeScript type definitions (217 lines)
78
+ ```
79
+
80
+ ## Development Commands
81
+
82
+ ```bash
83
+ # Development
84
+ npm run dev # Start development server
85
+ npm run build # Production build
86
+ npm run start # Start production server
87
+ npm run lint # ESLint checking
88
+
89
+ # Node version
90
+ nvm use # Uses Node v20.17.0 (specified in .nvmrc)
91
+ ```
92
+
93
+ ## Environment Configuration
94
+
95
+ The application requires extensive environment configuration in `.env.local`:
96
+
97
+ **Core Engines:**
98
+ - `LLM_ENGINE`: "INFERENCE_API" | "INFERENCE_ENDPOINT" | "OPENAI" | "GROQ" | "ANTHROPIC"
99
+ - `RENDERING_ENGINE`: "INFERENCE_API" | "INFERENCE_ENDPOINT" | "REPLICATE" | "VIDEOCHAIN" | "OPENAI"
100
+
101
+ **Authentication (configure only what you use):**
102
+ - `AUTH_HF_API_TOKEN` - Hugging Face API token
103
+ - `AUTH_OPENAI_API_KEY` - OpenAI API key
104
+ - `AUTH_GROQ_API_KEY` - Groq API key
105
+ - `AUTH_ANTHROPIC_API_KEY` - Anthropic/Claude API key
106
+ - `AUTH_REPLICATE_API_TOKEN` - Replicate.com token
107
+ - `AUTH_VIDEOCHAIN_API_TOKEN` - VideoChain API token
108
+
109
+ **LLM Configuration:**
110
+ - `LLM_HF_INFERENCE_API_MODEL` - Default: "HuggingFaceH4/zephyr-7b-beta"
111
+ - `LLM_OPENAI_API_MODEL` - Default: "gpt-4-turbo"
112
+ - `LLM_GROQ_API_MODEL` - Default: "mixtral-8x7b-32768"
113
+ - `LLM_ANTHROPIC_API_MODEL` - Default: "claude-3-opus-20240229"
114
+
115
+ **Rendering Configuration:**
116
+ - `RENDERING_HF_INFERENCE_API_BASE_MODEL` - Default: "stabilityai/stable-diffusion-xl-base-1.0"
117
+ - `RENDERING_REPLICATE_API_MODEL` - Default: "stabilityai/sdxl"
118
+ - `MAX_NB_PAGES` - Default: 6
119
+
120
+ ## Architecture Patterns
121
+
122
+ **State Management:**
123
+ - Zustand store with typed selectors and actions
124
+ - Complex state includes: panels, speeches, captions, renderedScenes, layouts
125
+ - Real-time panel generation status tracking
126
+
127
+ **LLM Integration Pattern:**
128
+ - Abstracted provider interface through `predict()` function
129
+ - Provider-specific implementations in separate files
130
+ - Standardized prompt templates and response parsing
131
+ - Support for multiple prompt formats (Zephyr, Llama, etc.)
132
+
133
+ **Image Generation Flow:**
134
+ 1. User provides prompt + selects preset
135
+ 2. LLM generates panel descriptions, speech, and captions
136
+ 3. Each panel description is sent to rendering engine
137
+ 4. Images are generated and cached
138
+ 5. User can edit speech bubbles and captions
139
+ 6. Final comic can be exported as image or CLAP file
140
+
141
+ **Server Actions Architecture:**
142
+ - 9 server actions for LLM predictions and rendering
143
+ - Clean separation between UI and server logic
144
+ - Error handling and fallbacks for API failures
145
+
146
+ **Comic Preset System:**
147
+ - 4 main preset categories with 678 lines of configuration
148
+ - Each preset defines: art style, color scheme, font, LLM prompts, image prompts
149
+ - Extensible system for adding new comic styles
150
+
151
+ **Font System:**
152
+ - 13 custom comic fonts loaded as local fonts
153
+ - Includes both Google Fonts (Indie Flower, The Girl Next Door) and custom fonts
154
+ - Proper CSS variable integration for consistent typography
155
+
156
+ ## Key Business Logic
157
+
158
+ **Panel Generation (`predictNextPanels`):**
159
+ - Generates multiple comic panels from a single prompt
160
+ - Handles continuation of existing stories
161
+ - Parses LLM responses into structured panel data (instructions, speech, captions)
162
+ - Error recovery and retry logic
163
+
164
+ **Rendering Pipeline (`render.ts`):**
165
+ - Multi-provider image generation (Replicate, HF, OpenAI, VideoChain)
166
+ - Automatic fallbacks between providers
167
+ - Image caching and optimization
168
+ - Support for different aspect ratios and resolutions
169
+
170
+ **State Persistence:**
171
+ - LocalStorage integration for user settings
172
+ - CLAP file format support for project serialization
173
+ - OAuth state management with Hugging Face
174
+
175
+ ## Development Patterns
176
+
177
+ **Component Organization:**
178
+ - Feature-based component structure in `app/interface/`
179
+ - Reusable UI components in `components/ui/`
180
+ - Custom hooks in `lib/` for complex logic
181
+
182
+ **Type Safety:**
183
+ - Comprehensive TypeScript definitions in `types.ts`
184
+ - Strict typing for LLM engines, rendering engines, and data flows
185
+ - Generic interfaces for extensible provider support
186
+
187
+ **Error Handling:**
188
+ - Graceful degradation for API failures
189
+ - User feedback through toast notifications
190
+ - Fallback content for missing images/data
191
+
192
+ **Performance Considerations:**
193
+ - Image optimization with Sharp
194
+ - Lazy loading of comic panels
195
+ - Efficient state updates with Zustand
196
+ - Canvas-based rendering for complex layouts
197
+
198
+ ## Testing & Quality
199
+
200
+ - **Linting**: ESLint with Next.js configuration
201
+ - **No test files found** - this is an area for improvement
202
+ - **Type checking**: Strict TypeScript configuration
203
+ - **Docker**: Production containerization available
204
+
205
+ ## Deployment
206
+
207
+ - Designed for Hugging Face Spaces deployment
208
+ - Docker containerization with Node.js Alpine
209
+ - Standalone Next.js output for containerized deployment
210
+ - Environment-based configuration for different deployment targets
211
+
212
+ ## Community & Contributions
213
+
214
+ - Open source project on Hugging Face
215
+ - Community contributions documented in `CONTRIBUTORS.md`
216
+ - Optional community sharing features
217
+ - OAuth integration for user management
218
+
219
+ ## Development Notes
220
+
221
+ - **No API routes found** - uses Server Actions exclusively
222
+ - **Canvas-based editing** with React Konva for interactive panels
223
+ - **Multi-provider architecture** allows switching between AI services
224
+ - **Extensive font library** for authentic comic typography
225
+ - **CLAP format integration** for multimedia project export
226
+ - **Rate limiting** configurable for production usage
227
+
228
+ ## Quick Start for Developers
229
+
230
+ 1. Copy `.env` to `.env.local` and configure your API keys
231
+ 2. Choose your LLM_ENGINE and RENDERING_ENGINE
232
+ 3. Install dependencies: `npm install`
233
+ 4. Run development server: `npm run dev`
234
+ 5. The app will guide you through first-time setup
235
+
236
+ Most common development setup:
237
+ - LLM_ENGINE: "OPENAI" with OpenAI API key
238
+ - RENDERING_ENGINE: "REPLICATE" with Replicate token
239
+ - This provides reliable, high-quality results for testing