Arkil Patel's picture

2 8 1

Arkil Patel

arkilpatel

·

https://arkilpatel.github.io/

AI & ML interests

NLP

Recent Activity

authored a paper 1 day ago

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

upvoted a paper 1 day ago

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

authored a paper 5 days ago

SafeArena: Evaluating the Safety of Autonomous Web Agents

View all activity

Organizations

arkilpatel's activity

authored a paper 1 day ago

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Paper • 2504.08942 • Published 5 days ago • 19

upvoted a paper 1 day ago

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Paper • 2504.08942 • Published 5 days ago • 19

authored a paper 5 days ago

SafeArena: Evaluating the Safety of Autonomous Web Agents

Paper • 2503.04957 • Published Mar 6 • 19

upvoted a paper 5 days ago

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published 14 days ago • 72

upvoted 2 papers about 1 month ago

Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

Paper • 2503.08644 • Published Mar 11 • 16

SafeArena: Evaluating the Safety of Autonomous Web Agents

Paper • 2503.04957 • Published Mar 6 • 19

upvoted a collection about 1 month ago

CHASE

Generate challenging synthetic data to evaluate LLMs • 5 items • Updated Feb 21 • 4

upvoted a paper about 1 month ago

Societal Alignment Frameworks Can Improve LLM Alignment

Paper • 2503.00069 • Published Feb 27 • 16

liked a dataset about 1 month ago

McGill-NLP/AdvBench-IR

Viewer • Updated Mar 12 • 520 • 126 • 3

authored 2 papers about 2 months ago

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 72

How to Get Your LLM to Generate Challenging Problems for Evaluation

Paper • 2502.14678 • Published Feb 20 • 17

updated 3 datasets about 2 months ago

McGill-NLP/CHASE-Code

Viewer • Updated Feb 21 • 500 • 99

McGill-NLP/CHASE-Math

Viewer • Updated Feb 21 • 500 • 95

McGill-NLP/CHASE-QA

Viewer • Updated Feb 21 • 671 • 137

updated a collection about 2 months ago

CHASE

Generate challenging synthetic data to evaluate LLMs • 5 items • Updated Feb 21 • 4

upvoted a paper about 2 months ago

How to Get Your LLM to Generate Challenging Problems for Evaluation

Paper • 2502.14678 • Published Feb 20 • 17

commented a paper about 2 months ago

How to Get Your LLM to Generate Challenging Problems for Evaluation

Paper • 2502.14678 • Published Feb 20 • 17 •

updated a collection about 2 months ago

CHASE

Generate challenging synthetic data to evaluate LLMs • 5 items • Updated Feb 21 • 4

published a dataset about 2 months ago

McGill-NLP/CHASE-Math

Viewer • Updated Feb 21 • 500 • 95

updated a collection about 2 months ago

CHASE

Generate challenging synthetic data to evaluate LLMs • 5 items • Updated Feb 21 • 4