fantaxy commited on
Commit
b81b62a
ยท
verified ยท
1 Parent(s): 9187071

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -0
README.md CHANGED
@@ -9,3 +9,92 @@ app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  pinned: false
10
  license: apache-2.0
11
  ---
12
+ ## PuLID for FLUX: Portrait-Guided Image Generation
13
+
14
+ This code implements **PuLID (Pure and Lightning ID customization)** for FLUX.1-dev, an advanced image generation system that allows users to create personalized images using ID (identity) images as guidance. The system combines the power of FLUX diffusion models with identity preservation capabilities.
15
+
16
+ ### Key Features
17
+
18
+ **1. Identity-Guided Generation**
19
+ - Upload an ID image (portrait photo) to guide the generation process
20
+ - Control identity strength with adjustable ID weight (0.0-3.0)
21
+ - Preserve facial features while applying various artistic styles
22
+
23
+ **2. Advanced Configuration Options**
24
+ - **Resolution Control**: Adjustable width (256-1536px) and height (256-1536px)
25
+ - **Generation Steps**: 1-20 steps for quality vs speed tradeoff
26
+ - **Guidance Scale**: Fine-tune adherence to prompts (1.0-10.0)
27
+ - **Seed Control**: Reproducible results with manual seed input
28
+
29
+ **3. True CFG (Classifier-Free Guidance)**
30
+ - Fake CFG mode (scale=1): Faster generation with basic guidance
31
+ - True CFG mode (scale>1): Enhanced quality with negative prompt support
32
+ - Configurable timestep for CFG activation
33
+
34
+ **4. Technical Architecture**
35
+ - Built on FLUX.1-dev diffusion model
36
+ - Utilizes T5 text encoder for prompt understanding
37
+ - CLIP model for image-text alignment
38
+ - Autoencoder for latent space operations
39
+ - GPU acceleration with CUDA support
40
+
41
+ ### How It Works
42
+
43
+ 1. **Text Prompt Input**: Describe the desired image style (e.g., "portrait, pixar")
44
+ 2. **ID Image Upload**: Provide a reference portrait for identity guidance
45
+ 3. **Parameter Tuning**: Adjust generation settings for optimal results
46
+ 4. **Image Generation**: The model creates an image matching the prompt while preserving the identity
47
+
48
+ ### Example Use Cases
49
+ - Transform portraits into different artistic styles (ice sculpture, pixar animation)
50
+ - Create personalized avatars maintaining facial identity
51
+ - Generate creative variations of portraits with text prompts
52
+ - Produce consistent character designs across different scenarios
53
+
54
+ The system leverages Gradio for an intuitive web interface, making advanced AI image generation accessible to users without technical expertise.
55
+
56
+ ---
57
+
58
+ ## PuLID for FLUX: ์ธ๋ฌผ ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ์‹œ์Šคํ…œ
59
+
60
+ ์ด ์ฝ”๋“œ๋Š” FLUX.1-dev๋ฅผ ์œ„ํ•œ **PuLID (Pure and Lightning ID customization)** ์‹œ์Šคํ…œ์„ ๊ตฌํ˜„ํ•œ ๊ฒƒ์œผ๋กœ, ID(์‹ ์›) ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€์ด๋“œ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐœ์ธํ™”๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ๊ณ ๊ธ‰ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. FLUX ํ™•์‚ฐ ๋ชจ๋ธ์˜ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ๊ณผ ์‹ ์› ๋ณด์กด ๊ธฐ๋Šฅ์„ ๊ฒฐํ•ฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.
61
+
62
+ ### ์ฃผ์š” ๊ธฐ๋Šฅ
63
+
64
+ **1. ์‹ ์› ๊ธฐ๋ฐ˜ ์ด๋ฏธ์ง€ ์ƒ์„ฑ**
65
+ - ID ์ด๋ฏธ์ง€(์ธ๋ฌผ ์‚ฌ์ง„)๋ฅผ ์—…๋กœ๋“œํ•˜์—ฌ ์ƒ์„ฑ ๊ณผ์ • ๊ฐ€์ด๋“œ
66
+ - ์กฐ์ ˆ ๊ฐ€๋Šฅํ•œ ID ๊ฐ€์ค‘์น˜(0.0-3.0)๋กœ ์‹ ์› ๊ฐ•๋„ ์ œ์–ด
67
+ - ๋‹ค์–‘ํ•œ ์˜ˆ์ˆ ์  ์Šคํƒ€์ผ์„ ์ ์šฉํ•˜๋ฉด์„œ๋„ ์–ผ๊ตด ํŠน์ง• ๋ณด์กด
68
+
69
+ **2. ๊ณ ๊ธ‰ ์„ค์ • ์˜ต์…˜**
70
+ - **ํ•ด์ƒ๋„ ์ œ์–ด**: ๋„ˆ๋น„(256-1536px)์™€ ๋†’์ด(256-1536px) ์กฐ์ ˆ ๊ฐ€๋Šฅ
71
+ - **์ƒ์„ฑ ๋‹จ๊ณ„**: ํ’ˆ์งˆ ๋Œ€ ์†๋„ ๊ท ํ˜•์„ ์œ„ํ•œ 1-20๋‹จ๊ณ„ ์„ค์ •
72
+ - **๊ฐ€์ด๋˜์Šค ์Šค์ผ€์ผ**: ํ”„๋กฌํ”„ํŠธ ์ค€์ˆ˜๋„ ๋ฏธ์„ธ ์กฐ์ •(1.0-10.0)
73
+ - **์‹œ๋“œ ์ œ์–ด**: ์ˆ˜๋™ ์‹œ๋“œ ์ž…๋ ฅ์œผ๋กœ ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ๊ฒฐ๊ณผ ์ƒ์„ฑ
74
+
75
+ **3. True CFG (Classifier-Free Guidance)**
76
+ - Fake CFG ๋ชจ๋“œ(scale=1): ๊ธฐ๋ณธ ๊ฐ€์ด๋˜์Šค๋กœ ๋น ๋ฅธ ์ƒ์„ฑ
77
+ - True CFG ๋ชจ๋“œ(scale>1): ๋ถ€์ • ํ”„๋กฌํ”„ํŠธ ์ง€์›์œผ๋กœ ํ–ฅ์ƒ๋œ ํ’ˆ์งˆ
78
+ - CFG ํ™œ์„ฑํ™” ์‹œ์  ์„ค์ • ๊ฐ€๋Šฅ
79
+
80
+ **4. ๊ธฐ์ˆ ์  ๊ตฌ์กฐ**
81
+ - FLUX.1-dev ํ™•์‚ฐ ๋ชจ๋ธ ๊ธฐ๋ฐ˜
82
+ - T5 ํ…์ŠคํŠธ ์ธ์ฝ”๋”๋กœ ํ”„๋กฌํ”„ํŠธ ์ดํ•ด
83
+ - CLIP ๋ชจ๋ธ๋กœ ์ด๋ฏธ์ง€-ํ…์ŠคํŠธ ์ •๋ ฌ
84
+ - ์ž ์žฌ ๊ณต๊ฐ„ ์ž‘์—…์„ ์œ„ํ•œ ์˜คํ† ์ธ์ฝ”๋”
85
+ - CUDA ์ง€์› GPU ๊ฐ€์†
86
+
87
+ ### ์ž‘๋™ ๋ฐฉ์‹
88
+
89
+ 1. **ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ ์ž…๋ ฅ**: ์›ํ•˜๋Š” ์ด๋ฏธ์ง€ ์Šคํƒ€์ผ ์„ค๋ช… (์˜ˆ: "portrait, pixar")
90
+ 2. **ID ์ด๋ฏธ์ง€ ์—…๋กœ๋“œ**: ์‹ ์› ๊ฐ€์ด๋“œ๋ฅผ ์œ„ํ•œ ์ฐธ์กฐ ์ธ๋ฌผ ์‚ฌ์ง„ ์ œ๊ณต
91
+ 3. **๋งค๊ฐœ๋ณ€์ˆ˜ ์กฐ์ •**: ์ตœ์ ์˜ ๊ฒฐ๊ณผ๋ฅผ ์œ„ํ•œ ์ƒ์„ฑ ์„ค์ • ์กฐ์ ˆ
92
+ 4. **์ด๋ฏธ์ง€ ์ƒ์„ฑ**: ๋ชจ๋ธ์ด ์‹ ์›์„ ๋ณด์กดํ•˜๋ฉด์„œ ํ”„๋กฌํ”„ํŠธ์— ๋งž๋Š” ์ด๋ฏธ์ง€ ์ƒ์„ฑ
93
+
94
+ ### ํ™œ์šฉ ์˜ˆ์‹œ
95
+ - ์ธ๋ฌผ ์‚ฌ์ง„์„ ๋‹ค์–‘ํ•œ ์˜ˆ์ˆ  ์Šคํƒ€์ผ๋กœ ๋ณ€ํ™˜ (์–ผ์Œ ์กฐ๊ฐ, ํ”ฝ์‚ฌ ์• ๋‹ˆ๋ฉ”์ด์…˜)
96
+ - ์–ผ๊ตด ์‹ ์›์„ ์œ ์ง€ํ•œ ๊ฐœ์ธํ™”๋œ ์•„๋ฐ”ํƒ€ ์ƒ์„ฑ
97
+ - ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋กœ ์ธ๋ฌผ์˜ ์ฐฝ์˜์ ์ธ ๋ณ€ํ˜• ์ƒ์„ฑ
98
+ - ๋‹ค์–‘ํ•œ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ์ผ๊ด€๋œ ์บ๋ฆญํ„ฐ ๋””์ž์ธ ์ œ์ž‘
99
+
100
+ ์ด ์‹œ์Šคํ…œ์€ Gradio๋ฅผ ํ™œ์šฉํ•œ ์ง๊ด€์ ์ธ ์›น ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์ œ๊ณตํ•˜์—ฌ, ๊ธฐ์ˆ ์  ์ „๋ฌธ ์ง€์‹์ด ์—†๋Š” ์‚ฌ์šฉ์ž๋„ ๊ณ ๊ธ‰ AI ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๊ธฐ๋Šฅ์„ ์‰ฝ๊ฒŒ ์ด์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.