InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Abstract
Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce InfiniteYou (InfU), one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.
Community
Caffe drink luxury
Project Page: https://bytedance.github.io/InfiniteYou
Code: https://github.com/bytedance/InfiniteYou.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion (2025)
- Personalize Anything for Free with Diffusion Transformer (2025)
- DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability (2025)
- DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation (2025)
- CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers (2025)
- Conceptrol: Concept Control of Zero-shot Personalized Image Generation (2025)
- Concat-ID: Towards Universal Identity-Preserving Video Synthesis (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper