ceval

community

https://cevalbenchmark.com

Activity Feed Request to join this org

AI & ML interests

We focus on Chinese evaluation of foundation models.

Recent Activity

yuzhen17 authored a paper about 1 month ago

SWE-RM: Execution-free Feedback For Software Engineering Agents

yuzhen17 authored a paper 3 months ago

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

yuzhen17 updated a dataset 7 months ago

ceval/ceval-exam

View all activity

models 0

None public yet

datasets 1

ceval/ceval-exam

Viewer • Updated Jul 27, 2025 • 13.9k • 22.3k • 293