Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Paper • 2506.09991 • Published Jun 11 • 55
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 137
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper • 2504.11536 • Published Apr 15 • 62