LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization Paper • 2507.15758 • Published 11 days ago • 34
Hierarchical Budget Policy Optimization for Adaptive Reasoning Paper • 2507.15844 • Published 11 days ago • 16
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning Paper • 2507.16814 • Published 10 days ago • 22
Perception-Aware Policy Optimization for Multimodal Reasoning Paper • 2507.06448 • Published 24 days ago • 44
EXPO: Stable Reinforcement Learning with Expressive Policies Paper • 2507.07986 • Published 22 days ago