UserBench: An Interactive Gym Environment for User-Centric Agents Paper • 2507.22034 • Published 26 days ago • 29
MiniCPM4 Collection MiniCPM4: Ultra-Efficient LLMs on End Devices • 22 items • Updated 17 days ago • 72
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published May 28 • 129
view article Article Process Reinforcement through Implicit Rewards By ganqu and 1 other • Jan 3 • 29
Eurus Collection Advancing LLM Reasoning Generalists with Preference Trees • 11 items • Updated 17 days ago • 25