Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation Paper • 2509.25849 • Published 22 days ago • 46
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO Paper • 2505.11595 • Published May 16 • 1