Reinforcement Learning Foundations for Deep Research Systems: A Survey Paper • 2509.06733 • Published 14 days ago • 31
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26 • 56