From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery Paper • 2508.14111 • Published 6 days ago • 25
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization Paper • 2508.14811 • Published 4 days ago • 32
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer Paper • 2508.09131 • Published 12 days ago • 14
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts Paper • 2507.20939 • Published 27 days ago • 56
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Paper • 2507.05255 • Published Jul 7 • 71
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published Jul 14 • 49
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper • 2507.08441 • Published Jul 11 • 61
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation Paper • 2506.07977 • Published Jun 9 • 41
FlexPainter: Flexible and Multi-View Consistent Texture Generation Paper • 2506.02620 • Published Jun 3 • 14
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers Paper • 2506.03065 • Published Jun 3 • 27
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning Paper • 2505.23504 • Published May 29 • 7
ViStoryBench: Comprehensive Benchmark Suite for Story Visualization Paper • 2505.24862 • Published May 30 • 31
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models Paper • 2505.16707 • Published May 22 • 46
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets Paper • 2505.07747 • Published May 12 • 61
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published Apr 24 • 93
StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians Paper • 2504.15281 • Published Apr 21 • 24