Show-o2: Improved Native Unified Multimodal Models Paper • 2506.15564 • Published 8 days ago • 27
Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper • 2506.13759 • Published 10 days ago • 41
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Paper • 2506.09985 • Published 15 days ago • 26
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework Paper • 2506.10741 • Published 14 days ago • 27
MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks Paper • 2506.05982 • Published 21 days ago • 2
MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks Paper • 2506.05982 • Published 21 days ago • 2 • 2
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation Paper • 2506.09790 • Published 15 days ago • 51
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers Paper • 2506.05573 • Published 21 days ago • 67
Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack Paper • 2506.01011 • Published 25 days ago • 9
Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack Paper • 2506.01011 • Published 25 days ago • 9 • 2
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers Paper • 2505.21541 • Published May 24 • 7
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers Paper • 2505.21541 • Published May 24 • 7
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers Paper • 2505.21541 • Published May 24 • 7 • 2
RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers Paper • 2506.02528 • Published 24 days ago • 15
RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers Paper • 2506.02528 • Published 24 days ago • 15 • 2
EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering Paper • 2505.24417 • Published 28 days ago • 13