Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published Jan 4 • 102
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published Feb 25 • 75
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Paper • 2504.16078 • Published Apr 22 • 21
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29 • 96
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models Paper • 2505.11711 • Published May 16 • 11
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models Paper • 2505.14810 • Published May 20 • 62
TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning Paper • 2505.14625 • Published May 20 • 13
ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning Paper • 2505.15776 • Published May 21 • 10
Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning Paper • 2505.16483 • Published May 22 • 10
One RL to See Them All: Visual Triple Unified Reinforcement Learning Paper • 2505.18129 • Published May 23 • 59
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 138
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 184
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs Paper • 2506.14245 • Published Jun 17 • 42
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents Paper • 2507.03112 • Published Jul 3 • 31
Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published Jul 7 • 39
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning Paper • 2507.05920 • Published Jul 8 • 11
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Paper • 2507.13158 • Published Jul 17 • 24
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19 • 131
Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following Paper • 2508.02150 • Published Aug 4 • 36
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11 • 47
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments Paper • 2508.08791 • Published Aug 12 • 16
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models Paper • 2508.10751 • Published Aug 14 • 27
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization Paper • 2508.14460 • Published Aug 20 • 82
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning Paper • 2508.16949 • Published Aug 23 • 22
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper • 2508.17445 • Published Aug 24 • 80
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning Paper • 2508.20751 • Published Aug 28 • 89
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning Paper • 2508.21113 • Published Aug 28 • 109
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2 • 217
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR Paper • 2509.02522 • Published Sep 2 • 25
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2 • 83
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers Paper • 2509.06493 • Published Sep 8 • 11
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models Paper • 2509.06949 • Published Sep 8 • 56
Reinforcement Learning Foundations for Deep Research Systems: A Survey Paper • 2509.06733 • Published Sep 8 • 31
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 183
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published 21 days ago • 129
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published 18 days ago • 52
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning Paper • 2509.22601 • Published 21 days ago • 29
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation Paper • 2509.25849 • Published 18 days ago • 46
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models Paper • 2509.26628 • Published 17 days ago • 12
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published 18 days ago • 129
BroRL: Scaling Reinforcement Learning via Broadened Exploration Paper • 2510.01180 • Published 16 days ago • 17
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training Paper • 2510.04996 • Published 11 days ago • 15
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published 14 days ago • 43
Demystifying Reinforcement Learning in Agentic Reasoning Paper • 2510.11701 • Published 4 days ago • 27