PRIME

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

stingning  updated a dataset 2 days ago
PRIME-RL/Eurus-2-SFT-Data
stingning  updated a dataset 2 days ago
PRIME-RL/EurusPRM-Stage1-Data
stingning  updated a dataset 2 days ago
PRIME-RL/Eurus-2-RL-Data
View all activity

Process Reinforcement Through Implicit Rewards

Links

Evaluation

Through PRIME, we successfully achieve substantial improvements on key reasoning benchmarks over our SFT version of the model, leading to 16.7% improvement on average, and over 20% on AMC&AIME competitions. Our final model Eurus-2-7B-PRIME, based on Qwen-2.5-Math-7B-Base, surpassed its instruct version on 5 key reasoning benchmarks. The final results are presented below:

Eurus-2-7B-PRIME Eurus-2-7B-SFT Qwen-2.5-Math-7B-Instruct Llama-3.1-70B-Instruct GPT-4o
AIME 2024 26.7 (+23.3) 3.3 13.3 16.7 9.3
MATH-500 79.2 (+14.1) 65.1 79.8 64.6 76.4
AMC 57.8 (+27.7) 30.1 50.6 30.1 45.8
Minerva Math 38.6 (+5.9) 32.7 34.6 35.3 36.8
OlympiadBench 42.1 (+12.3) 29.8 40.7 31.9 43.3
Avg. 48.9 (+16.7) 32.2 43.8 35.7 43.3

We achieved this with only 1/10 data and model resources compared with Qwen-Math.

Eurus-2-7B-PRIME Qwen2.5-Math-7B-Instruct
Base Model Qwen2.5-Math-7B Qwen2.5-Math-7B
SFT Data 230K (open-source) 2.5M (open-source and in-house)
RM Data 0 618K (in-house)
RM Eurus-2-7B-SFT Qwen2.5-Math-RM (72B)
RL Data 150K queries \times 4 samples 66K queries \times 32 samples

Citation

If you find PRIME or ImplicitPRM helpful, please cite us.

@article{cui2025process,
  title={Process reinforcement through implicit rewards},
  author={Cui, Ganqu and Yuan, Lifan and Wang, Zefan and Wang, Hanbin and Li, Wendi and He, Bingxiang and Fan, Yuchen and Yu, Tianyu and Xu, Qixin and Chen, Weize and others},
  journal={arXiv preprint arXiv:2502.01456},
  year={2025}
}
@article{yuan2024implicitprm,
  title={Free Process Rewards without Process Labels},
  author={Lifan Yuan and Wendi Li and Huayu Chen and Ganqu Cui and Ning Ding and Kaiyan Zhang and Bowen Zhou and Zhiyuan Liu and Hao Peng},
  journal={arXiv preprint arXiv:2412.01981},
  year={2024}
}