Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Paper • 2506.01943 • Published Jun 2 • 24
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks Paper • 2506.00411 • Published May 31 • 30
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 113