CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published 4 days ago • 22
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents Paper • 2505.15277 • Published 2 days ago • 80
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper • 2505.00703 • Published 21 days ago • 41
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning Paper • 2505.08617 • Published 10 days ago • 36
Optimizing Anytime Reasoning via Budget Relative Policy Optimization Paper • 2505.13438 • Published 3 days ago • 31
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning Paper • 2505.02835 • Published 17 days ago • 23