arxiv:2510.14975

WithAnyone: Towards Controllable and ID Consistent Image Generation

Published on Oct 16

· Submitted by

taesiri on Oct 17

#3 Paper of the day

StepFun

Upvote

Authors:

Hengyuan Xu ,

Wei Cheng ,

Peng Xing ,

Yixiao Fang ,

Gang Yu ,

Abstract

A diffusion-based model addresses copy-paste artifacts in text-to-image generation by using a large-scale paired dataset and a contrastive identity loss to balance identity fidelity and variation.

AI-generated summary

Identity-consistent generation has become an important focus in text-to-image research, with recent models achieving notable success in producing images aligned with a reference identity. Yet, the scarcity of large-scale paired datasets containing multiple images of the same individual forces most approaches to adopt reconstruction-based training. This reliance often leads to a failure mode we term copy-paste, where the model directly replicates the reference face rather than preserving identity across natural variations in pose, expression, or lighting. Such over-similarity undermines controllability and limits the expressive power of generation. To address these limitations, we (1) construct a large-scale paired dataset MultiID-2M, tailored for multi-person scenarios, providing diverse references for each identity; (2) introduce a benchmark that quantifies both copy-paste artifacts and the trade-off between identity fidelity and variation; and (3) propose a novel training paradigm with a contrastive identity loss that leverages paired data to balance fidelity with diversity. These contributions culminate in WithAnyone, a diffusion-based model that effectively mitigates copy-paste while preserving high identity similarity. Extensive qualitative and quantitative experiments demonstrate that WithAnyone significantly reduces copy-paste artifacts, improves controllability over pose and expression, and maintains strong perceptual quality. User studies further validate that our method achieves high identity fidelity while enabling expressive controllable generation.