ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts Paper • 2507.20939 • Published 5 days ago • 56
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Paper • 2507.05255 • Published 26 days ago • 68
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published 19 days ago • 48
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper • 2507.08441 • Published 22 days ago • 59
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation Paper • 2506.07977 • Published Jun 9 • 41
FlexPainter: Flexible and Multi-View Consistent Texture Generation Paper • 2506.02620 • Published Jun 3 • 14
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers Paper • 2506.03065 • Published Jun 3 • 27
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning Paper • 2505.23504 • Published May 29 • 7
ViStoryBench: Comprehensive Benchmark Suite for Story Visualization Paper • 2505.24862 • Published May 30 • 31
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models Paper • 2505.16707 • Published May 22 • 45
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets Paper • 2505.07747 • Published May 12 • 61
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published Apr 24 • 93
StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians Paper • 2504.15281 • Published Apr 21 • 24
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published Apr 8 • 178
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D Paper • 2411.02336 • Published Nov 4, 2024 • 25