ceval

community

https://cevalbenchmark.com

Activity Feed Request to join this org

AI & ML interests

We focus on Chinese evaluation of foundation models.

Recent Activity

yuzhen17 authored a paper about 1 month ago

SWE-RM: Execution-free Feedback For Software Engineering Agents

yuzhen17 authored a paper 3 months ago

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

yuzhen17 updated a dataset 6 months ago

ceval/ceval-exam

View all activity

yuzhen17

authored a paper about 1 month ago

SWE-RM: Execution-free Feedback For Software Engineering Agents

Paper • 2512.21919 • Published Dec 26, 2025 • 10

yuzhen17

authored a paper 3 months ago

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29, 2025 • 46

yuzhen17

updated a dataset 6 months ago

ceval/ceval-exam

Viewer • Updated Jul 27, 2025 • 13.9k • 23.3k • 293

yuzhen17

authored a paper 8 months ago

Pitfalls of Rule- and Model-based Verifiers -- A Case Study on Mathematical Reasoning

Paper • 2505.22203 • Published May 28, 2025 • 6

jxhe

authored a paper 8 months ago

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

Paper • 2505.19641 • Published May 26, 2025 • 68

yuzhen17

authored a paper 8 months ago

Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

Paper • 2505.15612 • Published May 21, 2025 • 34

jxhe

authored 2 papers 10 months ago

Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

Paper • 2504.10127 • Published Apr 14, 2025 • 17

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Paper • 2503.21614 • Published Mar 27, 2025 • 42

yuzhen17

in ceval/ceval-exam 10 months ago

[bot] Conversion to Parquet

#7 opened 11 months ago by

parquet-converter

yuzhen17

in ceval/ceval-exam 11 months ago

Convert dataset to Parquet

#6 opened 11 months ago by

lhoestq

jxhe

authored a paper 11 months ago

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Paper • 2503.18892 • Published Mar 24, 2025 • 31

yuzhen17

authored 2 papers 11 months ago

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Paper • 2503.18892 • Published Mar 24, 2025 • 31

Predictive Data Selection: The Data That Predicts Is the Data That Teaches

Paper • 2503.00808 • Published Mar 2, 2025 • 56

yuzhen17

authored a paper about 1 year ago

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47

jxhe

authored a paper almost 2 years ago

Compression Represents Intelligence Linearly

Paper • 2404.09937 • Published Apr 15, 2024 • 28

yuzhen17

authored a paper almost 2 years ago

Compression Represents Intelligence Linearly

Paper • 2404.09937 • Published Apr 15, 2024 • 28

yuzhen17

authored a paper about 2 years ago

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

Paper • 2305.08322 • Published May 15, 2023

jxhe

authored a paper over 2 years ago

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

Paper • 2305.08322 • Published May 15, 2023

AI & ML interests

Recent Activity

Team members 2

ceval's activity

[bot] Conversion to Parquet

Convert dataset to Parquet