MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement Paper • 2509.01977 • Published 8 days ago • 11
AnyI2V: Animating Any Conditional Image with Motion Control Paper • 2507.02857 • Published Jul 3 • 12
OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion Paper • 2507.06165 • Published Jul 8 • 56
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation Paper • 2506.21416 • Published Jun 26 • 28
Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper • 2506.13759 • Published Jun 16 • 44
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Paper • 2506.09985 • Published Jun 11 • 30
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework Paper • 2506.10741 • Published Jun 12 • 28
MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks Paper • 2506.05982 • Published Jun 6 • 2
MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks Paper • 2506.05982 • Published Jun 6 • 2 • 2
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation Paper • 2506.09790 • Published Jun 11 • 54
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers Paper • 2506.05573 • Published Jun 5 • 79
Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack Paper • 2506.01011 • Published Jun 1 • 9
Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack Paper • 2506.01011 • Published Jun 1 • 9 • 2
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers Paper • 2505.21541 • Published May 24 • 7
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers Paper • 2505.21541 • Published May 24 • 7