Shuttlet commited on
Commit
16edda4
·
verified ·
1 Parent(s): 9785b3c

Upload 14 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,11 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ example/cloth/cloth_1.png filter=lfs diff=lfs merge=lfs -text
37
+ example/cloth/cloth_3.jpg filter=lfs diff=lfs merge=lfs -text
38
+ example/pose/pose_1.jpg filter=lfs diff=lfs merge=lfs -text
39
+ example/tryon/keep_image_2.png filter=lfs diff=lfs merge=lfs -text
40
+ static/images/overview.png filter=lfs diff=lfs merge=lfs -text
41
+ static/showcase/garment_with_pose.png filter=lfs diff=lfs merge=lfs -text
42
+ static/showcase/single_garment.png filter=lfs diff=lfs merge=lfs -text
43
+ static/showcase/tryon.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,237 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ <h1 align="center">[AAAI2025] DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder</h1>
4
+
5
+ Ente Lin&dagger;, Xujie Zhang&dagger;, Fuwei Zhao, Yuxuan Luo, Xin Dong, Long Zeng*, Xiaodan Liang*
6
+
7
+
8
+ ### [[`arxiv`](https://arxiv.org/abs/2412.17644)][[`Paper`](https://arxiv.org/pdf/2412.17644)]
9
+ </div>
10
+
11
+
12
+ ## Abstract
13
+ <p>
14
+ Diffusion models for garment-centric human generation from text or image prompts have garnered emerging attention for their great application potential. However, existing methods often face a dilemma: lightweight approaches, such as adapters, are prone to generate inconsistent textures; while finetune-based methods involve high training costs and struggle to maintain the generalization capabilities of pretrained diffusion models, limiting their performance across diverse scenarios. To address these challenges, we propose <strong>DreamFit</strong>, which incorporates a lightweight Anything-Dressing Encoder specifically tailored for the garment-centric human generation.
15
+ </p>
16
+ <p>
17
+ DreamFit has three key advantages:
18
+ </p>
19
+ <ol>
20
+ <li><strong>Lightweight training</strong>: with the proposed adaptive attention and LoRA modules, DreamFit significantly minimizes the model complexity to 83.4M trainable parameters.</li>
21
+ <li><strong>Anything-Dressing</strong>: Our model generalizes surprisingly well to a wide range of (non-)garments, creative styles, and prompt instructions, consistently delivering high-quality results across diverse scenarios.</li>
22
+ <li><strong>Plug-and-play</strong>: DreamFit is engineered for smooth integration with any community control plugins for diffusion models, ensuring easy compatibility and minimizing adoption barriers.</li>
23
+ </ol>
24
+ <p>
25
+ To further enhance generation quality, DreamFit leverages pretrained large multi-modal models (LMMs) to enrich the prompt with fine-grained garment descriptions, thereby reducing the prompt gap between training and inference. We conduct comprehensive experiments on both 768 × 512 high-resolution benchmarks and in-the-wild images. DreamFit surpasses all existing methods, highlighting its state-of-the-art capabilities of garment-centric human generation.
26
+ </p>
27
+
28
+ ## Overview
29
+ <p align="center">
30
+ <img src="./static/images/overview.png" width=100% height=100%
31
+ class="center">
32
+ </p>
33
+
34
+ <p>
35
+ Our method constructs an <strong>Anything-Dressing Encoder</strong> utilizing <strong>LoRA</strong> layers. The reference image features are extracted by the Anything-Dressing Encoder and then passed into the denoising <strong>UNet</strong> via adaptive attention.
36
+ </p>
37
+ <p>
38
+ Furthermore, we incorporate <strong>Large Multimodal Models (LMM)</strong> into the inference process to reduce the text prompt gap between the training and testing.
39
+ </p>
40
+
41
+
42
+ ## Installation Guide
43
+ 1. Clone our repo:
44
+ ```bash
45
+ git clone https://github.com/bytedance/DreamFit.git
46
+ ```
47
+ 2. Create new virtual environment:
48
+ ```bash
49
+ conda create -n dreamfit python==3.10
50
+ conda activate dreamfit
51
+ ```
52
+ 3. Install our dependencies by running the following command:
53
+ ```bash
54
+ pip install -r requirements.txt
55
+ pip install flash-attn --no-build-isolation --use-pep517
56
+ ```
57
+
58
+ ## Models
59
+ 1. You can download the pretrained models [Here](https://huggingface.co/bytedance-research/Dreamfit). Download the checkpoint to `pretrained_models` folder.
60
+ 2. If you want to inference with StableDiffusion1.5 version, you need to download the [stable-diffusion-v1-5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5), [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse) to `pretrained_models`. If you want to generate images of different styles, you can download the corresponding stylized model, such as [RealisticVision](https://huggingface.co/SG161222/Realistic_Vision_V6.0_B1_noVAE), to `pretrained_models`.
61
+ 3. If you want to inference with Flux version, you need to download the [flux-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) to `pretrained_models` folder
62
+ 4. If you want to inference with pose control, you need to download the [Annotators](https://huggingface.co/lllyasviel/Annotators) to `pretrained_models` folder
63
+
64
+ The folder structures should look like these:
65
+
66
+ ```
67
+ ├── pretrained_models/
68
+ | ├── flux_i2i_with_pose.bin
69
+ │ ├── flux_i2i.bin
70
+ │ ├── flux_tryon.bin
71
+ │ ├── sd15_i2i.ckpt
72
+ | ├── stable-diffusion-v1-5/
73
+ | | ├── ...
74
+ | ├── sd-vae-ft-mse/
75
+ | | ├── diffusion_pytorch_model.bin
76
+ | | ├── ...
77
+ | ├── Realistic_Vision_V6.0_B1_noVAE(or other stylized model)/
78
+ | | ├── unet/
79
+ | | | ├── diffusion_pytorch_model.bin
80
+ | | | ├── ...
81
+ | | ├── ...
82
+ | ├── Annotators/
83
+ | | ├── body_pose_model.pth
84
+ | | ├── facenet.pth
85
+ | | ├── hand_pose_model.pth
86
+ | ├── FLUX.1-dev/
87
+ | | ├── flux1-dev.safetensors
88
+ | | ├── ae.safetensors
89
+ | | ├── tokenizer
90
+ | | ├── tokenizer_2
91
+ | | ├── text_encoder
92
+ | | ├── text_encoder_2
93
+ | | ├── ...
94
+ ```
95
+
96
+ ## Inference
97
+
98
+ ### Garment-Centric Generation
99
+
100
+ ``` bash
101
+ # inference with FLUX version
102
+ bash run_inference_dreamfit_flux_i2i.sh \
103
+ --cloth_path example/cloth/cloth_1.png \
104
+ --image_text "A woman wearing a white Bape T-shirt with a colorful ape graphic and bold text." \
105
+ --save_dir "." \
106
+ --seed 164143088151
107
+
108
+ # inference with StableDiffusion1.5 version
109
+ bash run_inference_dreamfit_sd15_i2i.sh \
110
+ --cloth_path example/cloth/cloth_3.jpg\
111
+ --image_text "A woman with curly hair wears a pink t-shirt with a logo and white stripes on the sleeves, paired with white trousers, against a plain white background."\
112
+ --ref_scale 1.0 \
113
+ --base_model pretrained_models/Realistic_Vision_V6.0_B1_noVAE/unet/diffusion_pytorch_model.bin \
114
+ --base_model_load_method diffusers \
115
+ --save_dir "." \
116
+ --seed 28
117
+ ```
118
+
119
+ Tips:
120
+ 1. If you have multiple pieces of clothing, you can splice them onto one picture, as shown in the second row.
121
+ 2. Use `--help` to check the meaning of each argument.
122
+
123
+ <table style="width:100%; text-align:center;">
124
+ <tr>
125
+ <th style="text-align:center;">Image Text</th>
126
+ <th style="text-align:center;">Cloth</th>
127
+ <th style="text-align:center;">Output</th>
128
+ </tr>
129
+ <tr>
130
+ <td style="text-align:center;">A woman wearing a white Bape T-shirt with a colorful ape graphic and bold text.</td>
131
+ <td style="text-align:center;">
132
+ <img src="./example/cloth/cloth_1.png" alt="alt text" style="max-height:800px;">
133
+ </td>
134
+ <td style="text-align:center;">
135
+ <img src="./static/showcase/single_garment.png" alt="alt text" style="max-height:400px;">
136
+ </td>
137
+ </tr>
138
+ <tr>
139
+ <td style="text-align:center;">A young woman with a casual yet stylish look, wearing a blue top, black skirt, and comfortable cream slip-on shoes.</td>
140
+ <td style="text-align:center;">
141
+ <img src="./example/cloth/cloth_2.jpg" alt="alt text" style="max-height:600px;">
142
+ </td>
143
+ <td style="text-align:center;">
144
+ <img src="./static/showcase/multi_garments.jpg" alt="alt text" style="max-height:1100px;">
145
+ </td>
146
+ </tr>
147
+ </table>
148
+
149
+ ### Garment-Centric Generation with Pose Control
150
+
151
+ ``` bash
152
+ bash run_inference_dreamfit_flux_i2i_with_pose.sh \
153
+ --cloth_path example/cloth/cloth_1.png \
154
+ --pose_path example/pose/pose_1.jpg \
155
+ --image_text "A woman wearing a white Bape T-shirt with a colorful ape graphic and bold text." \
156
+ --save_dir "." \
157
+ --seed 16414308815
158
+ ```
159
+
160
+ <table style="width:100%; text-align:center;">
161
+ <tr>
162
+ <th style="text-align:center;">Image Text</th>
163
+ <th style="text-align:center;">Cloth</th>
164
+ <th style="text-align:center;">Pose Image</th>
165
+ <th style="text-align:center;">Output</th>
166
+ </tr>
167
+ <tr>
168
+ <td style="text-align:center;">A woman wearing a white Bape T-shirt with a colorful ape graphic and bold text.</td>
169
+ <td style="text-align:center;">
170
+ <img src="./example/cloth/cloth_1.png" alt="alt text" style="max-height:600px;">
171
+ </td>
172
+ <td style="text-align:center;">
173
+ <img src="./example/pose/pose_1.jpg" alt="alt text" style="max-height:600px;">
174
+ </td>
175
+ <td style="text-align:center;">
176
+ <img src="./static/showcase/garment_with_pose.png" alt="alt text" style="max-height:600px;">
177
+ </td>
178
+ </tr>
179
+ </table>
180
+
181
+ ### Tryon
182
+
183
+
184
+ ``` bash
185
+ bash run_inference_dreamfit_flux_tryon.sh \
186
+ --cloth_path example/cloth/cloth_1.png \
187
+ --keep_image_path example/tryon/keep_image_4.png \
188
+ --image_text "A woman wearing a white Bape T-shirt with a colorful ape graphic and bold text and a blue jeans." \
189
+ --save_dir "." \
190
+ --seed 16414308815
191
+ ```
192
+
193
+ Tips:
194
+ 1. Keep image is obtained by drawing the openpose on the garment-agnostic region.
195
+ 2. The generation code for keep image cannot be open-sourced for the time being. As an alternative, we have provided several keep images for testing.
196
+
197
+ <table style="width:100%; text-align:center;">
198
+ <tr>
199
+ <th style="text-align:center;">Image Text</th>
200
+ <th style="text-align:center;">Cloth</th>
201
+ <th style="text-align:center;">Keep Image</th>
202
+ <th style="text-align:center;">Output</th>
203
+ </tr>
204
+ <tr>
205
+ <td style="text-align:center;">A woman wearing a white Bape T-shirt with a colorful ape graphic and bold text and a blue jeans.</td>
206
+ <td style="text-align:center;">
207
+ <img src="./example/cloth/cloth_1.png" alt="alt text" style="max-height:600px;">
208
+ </td>
209
+ <td style="text-align:center;">
210
+ <img src="./example/tryon/keep_image_1.png" alt="alt text" style="max-height:600px;">
211
+ </td>
212
+ <td style="text-align:center;">
213
+ <img src="./static/showcase/tryon.png" alt="alt text" style="max-height:600px;">
214
+ </td>
215
+ </tr>
216
+ </table>
217
+
218
+ ## Disclaimer
219
+ Most images used in this repository are sourced from the Internet. These images are solely intended to demonstrate the capabilities of our research. If you have any concerns, please contact us, and we will promptly remove any inappropriate content.
220
+
221
+ This project aims to make a positive impact on the field of AI-driven image generation. Users are free to create images using this tool, but they must comply with local laws and use it responsibly. The developers do not assume any responsibility for potential misuse by users.
222
+
223
+ ## Citation
224
+ ```
225
+ @article{lin2024dreamfit,
226
+ title={DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder},
227
+ author={Lin, Ente and Zhang, Xujie and Zhao, Fuwei and Luo, Yuxuan and Dong, Xin and Zeng, Long and Liang, Xiaodan},
228
+ journal={arXiv preprint arXiv:2412.17644},
229
+ year={2024}
230
+ }
231
+ ```
232
+
233
+ ## Acknowledgements
234
+ Thanks to [x-flux](https://github.com/XLabs-AI/x-flux) and [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone/pulse) repositories, for their open research and exploration.
235
+
236
+ ## Contact
237
+ If you have any comments or questions, please please open a new issue or feel free to contact [Ente Lin]([email protected]) and [Xin Dong]([email protected]).
example/cloth/cloth_1.png ADDED

Git LFS Details

  • SHA256: afa0e358a11f0e77321239729435e45d406dbb1db184fba8e26f4d776184c498
  • Pointer size: 131 Bytes
  • Size of remote file: 222 kB
example/cloth/cloth_2.jpg ADDED
example/cloth/cloth_3.jpg ADDED

Git LFS Details

  • SHA256: 5f9250903aa2221e23b63ff3e7bb4dadcb2e907b4845802b1f755ad2e8ffd234
  • Pointer size: 131 Bytes
  • Size of remote file: 104 kB
example/pose/pose_1.jpg ADDED

Git LFS Details

  • SHA256: 8f91bacd6301095aa1d21e809e0bab937d2e44c6e8354e89bf69537ed0f3ab55
  • Pointer size: 131 Bytes
  • Size of remote file: 158 kB
example/tryon/keep_image_1.png ADDED
example/tryon/keep_image_2.png ADDED

Git LFS Details

  • SHA256: 0c1306703773a8496ef2551ea188151b3c2c39e03d302c460c8216aceaa3c523
  • Pointer size: 131 Bytes
  • Size of remote file: 159 kB
example/tryon/keep_image_3.png ADDED
example/tryon/keep_image_4.png ADDED
static/images/overview.png ADDED

Git LFS Details

  • SHA256: 7903b05191462fdc0520fa42a0f21c9d8bdae5ad690a1e0949ffc7a563f2787b
  • Pointer size: 131 Bytes
  • Size of remote file: 628 kB
static/showcase/garment_with_pose.png ADDED

Git LFS Details

  • SHA256: 391ab3b179150de2ab0b036086e5012c2968b84e30bba7bf5fcc249523539393
  • Pointer size: 131 Bytes
  • Size of remote file: 541 kB
static/showcase/multi_garments.jpg ADDED
static/showcase/single_garment.png ADDED

Git LFS Details

  • SHA256: 423d7f11bb3f98547a146632f7e9bab91edbf82d9fdc7777ed40ab8d39f72bae
  • Pointer size: 131 Bytes
  • Size of remote file: 490 kB
static/showcase/tryon.png ADDED

Git LFS Details

  • SHA256: 6755f003533c3c8b27344eda6f221d71b478c6a7639dbbbbb37b31a009fc6e34
  • Pointer size: 131 Bytes
  • Size of remote file: 996 kB