VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? Paper • 2404.05955 • Published Apr 9, 2024
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism Paper • 2407.10457 • Published Jul 15, 2024 • 25
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories Paper • 2410.07706 • Published Oct 10, 2024
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published 12 days ago • 76
Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents Paper • 2403.02502 • Published Mar 4, 2024 • 3