RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Paper • 2506.04308 • Published Jun 4 • 43
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Paper • 2506.04308 • Published Jun 4 • 43
AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation Paper • 2403.13352 • Published Mar 20, 2024 • 1
Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models Paper • 2405.20775 • Published May 26, 2024
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection Paper • 2412.04455 • Published Dec 5, 2024 • 39
WorldSimBench: Towards Video Generation Models as World Simulators Paper • 2410.18072 • Published Oct 23, 2024 • 20
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control Paper • 2403.12037 • Published Mar 18, 2024 • 1
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception Paper • 2312.07472 • Published Dec 12, 2023 • 2