-
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 257 -
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
Paper • 2507.15758 • Published • 34 -
Hierarchical Budget Policy Optimization for Adaptive Reasoning
Paper • 2507.15844 • Published • 16 -
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning
Paper • 2507.16814 • Published • 22