AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Paper β’ 2504.08942 β’ Published Apr 11 β’ 27