Reconstructing 4D Spatial Intelligence: A Survey Paper • 2507.21045 • Published 5 days ago • 33
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding Paper • 2507.15028 • Published 13 days ago • 20
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning Paper • 2507.05920 • Published 25 days ago • 11
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model Paper • 2507.01953 • Published about 1 month ago • 19
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models Paper • 2506.21356 • Published Jun 26 • 22
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning Paper • 2506.13654 • Published Jun 16 • 43
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior Paper • 2506.08012 • Published Jun 9 • 7
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers Paper • 2506.07986 • Published Jun 9 • 19
DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation Paper • 2506.03123 • Published Jun 3 • 14
Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM Paper • 2505.15816 • Published May 21 • 3
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency Paper • 2503.20785 • Published Mar 26 • 22
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness Paper • 2503.21755 • Published Mar 27 • 34
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models Paper • 2503.18886 • Published Mar 24 • 22
EgoLife Collection CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/ • 10 items • Updated Mar 7 • 19