Process Reward Models that Think -- https://arxiv.org/abs/2504.16828
AI & ML interests
Factuality, reasoning, alignment, LLM applications
Recent Activity
View all activity
spaces
5
Running
FactRBench
π
View and analyze long-form factuality leaderboard
Running
2
ExpertLongBench
π
Leaderboard for ExpertLongBench
Sleeping
1
ManyICLBench
π
Leaderboard for ManyICLBench
Running
MLRC-BENCH
π
Display model performance rankings
Sleeping
3
Factbench
π
View and compare language model factuality scores
datasets
12
launch/ExpertLongBench
Preview
β’
Updated
β’
361
β’
10
launch/thinkprm-1K-verification-cots
Viewer
β’
Updated
β’
1k
β’
64
β’
6
launch/ManyICLBench
Viewer
β’
Updated
β’
66
β’
640
β’
1
launch/CMV
Viewer
β’
Updated
β’
133
β’
35
launch/FactRBench
Viewer
β’
Updated
β’
1.06k
β’
74
β’
1
launch/FactBench
Viewer
β’
Updated
β’
1k
β’
91
β’
3
launch/CLASH
Viewer
β’
Updated
β’
345
β’
66
β’
2
launch/gov_report
Viewer
β’
Updated
β’
58.4k
β’
956
β’
7
launch/gov_report_qs
Viewer
β’
Updated
β’
7.87k
β’
648
β’
4
launch/open_question_type
Viewer
β’
Updated
β’
4.96k
β’
759
β’
6