File size: 4,714 Bytes
dbb55f6
2c90498
aa8012e
dbb55f6
 
 
5308b8e
dbb55f6
 
 
 
b81b62a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
title: PuLID based FLUX FaceID
emoji: ๐Ÿค—
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
license: apache-2.0
---
## PuLID for FLUX: Portrait-Guided Image Generation

This code implements **PuLID (Pure and Lightning ID customization)** for FLUX.1-dev, an advanced image generation system that allows users to create personalized images using ID (identity) images as guidance. The system combines the power of FLUX diffusion models with identity preservation capabilities.

### Key Features

**1. Identity-Guided Generation**
- Upload an ID image (portrait photo) to guide the generation process
- Control identity strength with adjustable ID weight (0.0-3.0)
- Preserve facial features while applying various artistic styles

**2. Advanced Configuration Options**
- **Resolution Control**: Adjustable width (256-1536px) and height (256-1536px)
- **Generation Steps**: 1-20 steps for quality vs speed tradeoff
- **Guidance Scale**: Fine-tune adherence to prompts (1.0-10.0)
- **Seed Control**: Reproducible results with manual seed input

**3. True CFG (Classifier-Free Guidance)**
- Fake CFG mode (scale=1): Faster generation with basic guidance
- True CFG mode (scale>1): Enhanced quality with negative prompt support
- Configurable timestep for CFG activation

**4. Technical Architecture**
- Built on FLUX.1-dev diffusion model
- Utilizes T5 text encoder for prompt understanding
- CLIP model for image-text alignment
- Autoencoder for latent space operations
- GPU acceleration with CUDA support

### How It Works

1. **Text Prompt Input**: Describe the desired image style (e.g., "portrait, pixar")
2. **ID Image Upload**: Provide a reference portrait for identity guidance
3. **Parameter Tuning**: Adjust generation settings for optimal results
4. **Image Generation**: The model creates an image matching the prompt while preserving the identity

### Example Use Cases
- Transform portraits into different artistic styles (ice sculpture, pixar animation)
- Create personalized avatars maintaining facial identity
- Generate creative variations of portraits with text prompts
- Produce consistent character designs across different scenarios

The system leverages Gradio for an intuitive web interface, making advanced AI image generation accessible to users without technical expertise.

---

## PuLID for FLUX: ์ธ๋ฌผ ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ์‹œ์Šคํ…œ

์ด ์ฝ”๋“œ๋Š” FLUX.1-dev๋ฅผ ์œ„ํ•œ **PuLID (Pure and Lightning ID customization)** ์‹œ์Šคํ…œ์„ ๊ตฌํ˜„ํ•œ ๊ฒƒ์œผ๋กœ, ID(์‹ ์›) ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€์ด๋“œ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐœ์ธํ™”๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ๊ณ ๊ธ‰ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. FLUX ํ™•์‚ฐ ๋ชจ๋ธ์˜ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ๊ณผ ์‹ ์› ๋ณด์กด ๊ธฐ๋Šฅ์„ ๊ฒฐํ•ฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

### ์ฃผ์š” ๊ธฐ๋Šฅ

**1. ์‹ ์› ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ์ƒ์„ฑ**
- ID ์ด๋ฏธ์ง€(์ธ๋ฌผ ์‚ฌ์ง„)๋ฅผ ์—…๋กœ๋“œํ•˜์—ฌ ์ƒ์„ฑ ๊ณผ์ • ๊ฐ€์ด๋“œ
- ์กฐ์ ˆ ๊ฐ€๋Šฅํ•œ ID ๊ฐ€์ค‘์น˜(0.0-3.0)๋กœ ์‹ ์› ๊ฐ•๋„ ์ œ์–ด
- ๋‹ค์–‘ํ•œ ์˜ˆ์ˆ ์  ์Šคํƒ€์ผ์„ ์ ์šฉํ•˜๋ฉด์„œ๋„ ์–ผ๊ตด ํŠน์ง• ๋ณด์กด

**2. ๊ณ ๊ธ‰ ์„ค์ • ์˜ต์…˜**
- **ํ•ด์ƒ๋„ ์ œ์–ด**: ๋„ˆ๋น„(256-1536px)์™€ ๋†’์ด(256-1536px) ์กฐ์ ˆ ๊ฐ€๋Šฅ
- **์ƒ์„ฑ ๋‹จ๊ณ„**: ํ’ˆ์งˆ ๋Œ€ ์†๋„ ๊ท ํ˜•์„ ์œ„ํ•œ 1-20๋‹จ๊ณ„ ์„ค์ •
- **๊ฐ€์ด๋˜์Šค ์Šค์ผ€์ผ**: ํ”„๋กฌํ”„ํŠธ ์ค€์ˆ˜๋„ ๋ฏธ์„ธ ์กฐ์ •(1.0-10.0)
- **์‹œ๋“œ ์ œ์–ด**: ์ˆ˜๋™ ์‹œ๋“œ ์ž…๋ ฅ์œผ๋กœ ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ๊ฒฐ๊ณผ ์ƒ์„ฑ

**3. True CFG (Classifier-Free Guidance)**
- Fake CFG ๋ชจ๋“œ(scale=1): ๊ธฐ๋ณธ ๊ฐ€์ด๋˜์Šค๋กœ ๋น ๋ฅธ ์ƒ์„ฑ
- True CFG ๋ชจ๋“œ(scale>1): ๋ถ€์ • ํ”„๋กฌํ”„ํŠธ ์ง€์›์œผ๋กœ ํ–ฅ์ƒ๋œ ํ’ˆ์งˆ
- CFG ํ™œ์„ฑํ™” ์‹œ์  ์„ค์ • ๊ฐ€๋Šฅ

**4. ๊ธฐ์ˆ ์  ๊ตฌ์กฐ**
- FLUX.1-dev ํ™•์‚ฐ ๋ชจ๋ธ ๊ธฐ๋ฐ˜
- T5 ํ…์ŠคํŠธ ์ธ์ฝ”๋”๋กœ ํ”„๋กฌํ”„ํŠธ ์ดํ•ด
- CLIP ๋ชจ๋ธ๋กœ ์ด๋ฏธ์ง€-ํ…์ŠคํŠธ ์ •๋ ฌ
- ์ž ์žฌ ๊ณต๊ฐ„ ์ž‘์—…์„ ์œ„ํ•œ ์˜คํ† ์ธ์ฝ”๋”
- CUDA ์ง€์› GPU ๊ฐ€์†

### ์ž‘๋™ ๋ฐฉ์‹

1. **ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ ์ž…๋ ฅ**: ์›ํ•˜๋Š” ์ด๋ฏธ์ง€ ์Šคํƒ€์ผ ์„ค๋ช… (์˜ˆ: "portrait, pixar")
2. **ID ์ด๋ฏธ์ง€ ์—…๋กœ๋“œ**: ์‹ ์› ๊ฐ€์ด๋“œ๋ฅผ ์œ„ํ•œ ์ฐธ์กฐ ์ธ๋ฌผ ์‚ฌ์ง„ ์ œ๊ณต
3. **๋งค๊ฐœ๋ณ€์ˆ˜ ์กฐ์ •**: ์ตœ์ ์˜ ๊ฒฐ๊ณผ๋ฅผ ์œ„ํ•œ ์ƒ์„ฑ ์„ค์ • ์กฐ์ ˆ
4. **์ด๋ฏธ์ง€ ์ƒ์„ฑ**: ๋ชจ๋ธ์ด ์‹ ์›์„ ๋ณด์กดํ•˜๋ฉด์„œ ํ”„๋กฌํ”„ํŠธ์— ๋งž๋Š” ์ด๋ฏธ์ง€ ์ƒ์„ฑ

### ํ™œ์šฉ ์˜ˆ์‹œ
- ์ธ๋ฌผ ์‚ฌ์ง„์„ ๋‹ค์–‘ํ•œ ์˜ˆ์ˆ  ์Šคํƒ€์ผ๋กœ ๋ณ€ํ™˜ (์–ผ์Œ ์กฐ๊ฐ, ํ”ฝ์‚ฌ ์• ๋‹ˆ๋ฉ”์ด์…˜)
- ์–ผ๊ตด ์‹ ์›์„ ์œ ์ง€ํ•œ ๊ฐœ์ธํ™”๋œ ์•„๋ฐ”ํƒ€ ์ƒ์„ฑ
- ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋กœ ์ธ๋ฌผ์˜ ์ฐฝ์˜์ ์ธ ๋ณ€ํ˜• ์ƒ์„ฑ
- ๋‹ค์–‘ํ•œ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ์ผ๊ด€๋œ ์บ๋ฆญํ„ฐ ๋””์ž์ธ ์ œ์ž‘

์ด ์‹œ์Šคํ…œ์€ Gradio๋ฅผ ํ™œ์šฉํ•œ ์ง๊ด€์ ์ธ ์›น ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์ œ๊ณตํ•˜์—ฌ, ๊ธฐ์ˆ ์  ์ „๋ฌธ ์ง€์‹์ด ์—†๋Š” ์‚ฌ์šฉ์ž๋„ ๊ณ ๊ธ‰ AI ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๊ธฐ๋Šฅ์„ ์‰ฝ๊ฒŒ ์ด์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.