5 10 14

Boyuan Zheng

boyuanzheng010

https://boyuanzheng010.github.io/

AI & ML interests

Language Agents, Multilinguality

Recent Activity

upvoted a paper 5 days ago

LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

updated a dataset about 1 month ago

osunlp/WebGuard

published a dataset about 1 month ago

osunlp/WebGuard

View all activity

Organizations

upvoted a paper 5 days ago

LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published 8 days ago • 75

updated a dataset about 1 month ago

osunlp/WebGuard

Viewer • Updated Jul 28 • 6k • 31

published a dataset about 1 month ago

osunlp/WebGuard

Viewer • Updated Jul 28 • 6k • 31

updated a dataset about 2 months ago

boyuanzheng010/webguard_test

Viewer • Updated Jul 24 • 6.49k • 11

published a dataset about 2 months ago

boyuanzheng010/webguard_test

Viewer • Updated Jul 24 • 6.49k • 11

upvoted a paper 2 months ago

Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

Paper • 2506.21506 • Published Jun 26 • 51

updated a dataset 4 months ago

boyuanzheng010/webguard

Viewer • Updated May 16 • 6.49k • 4 • 1

published a dataset 4 months ago

boyuanzheng010/webguard

Viewer • Updated May 16 • 6.49k • 4 • 1

upvoted a paper 5 months ago

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

Paper • 2504.08942 • Published Apr 11 • 27

liked a Space 5 months ago

Agent Reward Bench Demo

💻

Visualize agent interactions with WebArena tasks

upvoted a paper 5 months ago

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

Paper • 2504.07079 • Published Apr 9 • 11

commented a paper 5 months ago

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

Paper • 2504.07079 • Published Apr 9 • 11 •

published a model 5 months ago

boyuanzheng010/DeepSeek-R1-Distill-Qwen-1.5B-GRPO

Updated Apr 6

updated a model 5 months ago

boyuanzheng010/Qwen2.5-1.5B-Open-R1-Distill

Text Generation • 2B • Updated Apr 2 • 6

published a model 5 months ago

boyuanzheng010/Qwen2.5-1.5B-Open-R1-Distill

Text Generation • 2B • Updated Apr 2 • 6

liked a Space 6 months ago

Online-Mind2Web Leaderboard

🌐

Display and analyze evaluation results for agents

upvoted an article 6 months ago

Article

Open R1: Update #3

and 9 others •

Mar 11

• 295

liked a Space 6 months ago

Safearena Leaderboard

🏃

SafeArena Leaderboard

authored a paper 8 months ago

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Paper • 2411.06559 • Published Nov 10, 2024 • 15

liked a dataset 8 months ago

xlangai/aguvis-stage2

Preview • Updated Jul 30 • 362 • 24

Boyuan Zheng

AI & ML interests

Recent Activity

Organizations

boyuanzheng010's activity

Agent Reward Bench Demo

Online-Mind2Web Leaderboard

Open R1: Update #3

Safearena Leaderboard