Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents Paper • 2411.06559 • Published Nov 10, 2024 • 15
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models Paper • 2502.06755 • Published Feb 10 • 7
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents Paper • 2502.11357 • Published Feb 17 • 10
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20 • 44
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models Paper • 2502.14802 • Published Feb 20 • 13
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics Paper • 2411.16537 • Published Nov 25, 2024
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 300
An Illusion of Progress? Assessing the Current State of Web Agents Paper • 2504.01382 • Published Apr 2 • 4
Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis Paper • 2501.09333 • Published Jan 16 • 1
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills Paper • 2504.07079 • Published Apr 9 • 12
MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools Paper • 2504.20168 • Published Apr 28 • 1
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments Paper • 2505.21936 • Published May 28 • 1
BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning Paper • 2505.23883 • Published May 29 • 1
Is Extending Modality The Right Path Towards Omni-Modality? Paper • 2506.01872 • Published Jun 2 • 23
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge Paper • 2506.21506 • Published Jun 26 • 51
OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation Paper • 2506.05606 • Published Jun 5
Watch and Learn: Learning to Use Computers from Online Videos Paper • 2510.04673 • Published 7 days ago • 9