VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge Paper • 2504.10342 • Published 12 days ago • 11
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Paper • 2410.14669 • Published Oct 18, 2024 • 40
Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru Paper • 2503.07587 • Published Mar 10 • 11
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper • 2503.21620 • Published about 1 month ago • 61