-
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Paper • 2504.08942 • Published • 18 -
McGill-NLP/agent-reward-bench
Viewer • Updated • 1.41k • 7 • 2 -
1
Agent Reward Bench Demo
💻Visualize agent interactions with WebArena tasks
-
Agent Reward Bench Leaderboard
🥇Leaderboard for AgentRewardBench

McGill NLP Group
university
AI & ML interests
computational linguistics, natural language processing
Recent Activity
View all activity
Collections
11
spaces
5
pinned
Running
15
WebLINX Explorer
😻
Browse and visualize web demonstration recordings
Running
Agent Reward Bench Leaderboard
🥇
Leaderboard for AgentRewardBench
Running
1
Agent Reward Bench Demo
💻
Visualize agent interactions with WebArena tasks
Running
2
Safearena Leaderboard
🏃
SafeArena Leaderboard
Runtime error
5
AURORA
🌖
models
58

McGill-NLP/nano-aha-moment-3b
Text Generation
•
Updated
•
97
•
2

McGill-NLP/AURORA
Image-to-Image
•
Updated
•
123
•
4

McGill-NLP/pix2act-large-weblinx
Text Generation
•
Updated
•
20
•
1

McGill-NLP/LLM2Vec-Meta-Llama-31-8B-Instruct-mntp
Sentence Similarity
•
Updated
•
228
•
2

McGill-NLP/LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-supervised
Sentence Similarity
•
Updated
•
309
•
4

McGill-NLP/LLM2Vec-Meta-Llama-31-8B-Instruct-mntp-unsup-simcse
Sentence Similarity
•
Updated
•
172
•
2

McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp
Sentence Similarity
•
Updated
•
2.33k
•
10

McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp
Sentence Similarity
•
Updated
•
91

McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp
Sentence Similarity
•
Updated
•
2.68k
•
4

McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp
Sentence Similarity
•
Updated
•
8.86k
•
16
datasets
23
McGill-NLP/agent-reward-bench
Viewer
•
Updated
•
1.41k
•
7
•
2
McGill-NLP/MultiDigit-20
Viewer
•
Updated
•
16k
•
79
McGill-NLP/AdvBench-IR
Viewer
•
Updated
•
520
•
126
•
3
McGill-NLP/safearena
Updated
•
45
•
2
McGill-NLP/WebLINX-full
Updated
•
103k
•
6
McGill-NLP/CHASE-Code
Viewer
•
Updated
•
500
•
99
McGill-NLP/CHASE-Math
Viewer
•
Updated
•
500
•
95
McGill-NLP/CHASE-QA
Viewer
•
Updated
•
671
•
137
McGill-NLP/weblinx-browsergym
Updated
•
3.17k
•
3
McGill-NLP/WebLINX
Viewer
•
Updated
•
79.8k
•
596
•
61