VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published Dec 31, 2024 • 47
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints Paper • 2501.03841 • Published Jan 7 • 55
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper • 2501.04003 • Published Jan 7 • 27
Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware Paper • 2505.09601 • Published May 14 • 5
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Paper • 2507.23682 • Published Jul 31 • 23
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published Aug 11 • 43
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Paper • 2508.05635 • Published Aug 7 • 73
PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era Paper • 2509.12989 • Published 10 days ago • 27
FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies Paper • 2509.04996 • Published 21 days ago • 13
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts Paper • 2509.10813 • Published 13 days ago • 30