ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts Paper • 2507.20939 • Published 5 days ago • 56
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Paper • 2507.05255 • Published 26 days ago • 68
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published 19 days ago • 48
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper • 2507.08441 • Published 22 days ago • 59
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation Paper • 2506.07977 • Published Jun 9 • 41
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation Paper • 2506.07977 • Published Jun 9 • 41
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation Paper • 2506.07977 • Published Jun 9 • 41 • 2
FlexPainter: Flexible and Multi-View Consistent Texture Generation Paper • 2506.02620 • Published Jun 3 • 14
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers Paper • 2506.03065 • Published Jun 3 • 27
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers Paper • 2506.03065 • Published Jun 3 • 27 • 2
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers Paper • 2506.03065 • Published Jun 3 • 27
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning Paper • 2505.23504 • Published May 29 • 7
ViStoryBench: Comprehensive Benchmark Suite for Story Visualization Paper • 2505.24862 • Published May 30 • 31