DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 133
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks Paper • 2504.05118 • Published Apr 7 • 25
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning Paper • 2504.08600 • Published Apr 11 • 29
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce Paper • 2504.11343 • Published Apr 15 • 19
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models Paper • 2504.15716 • Published Apr 22 • 10
WebThinker: Empowering Large Reasoning Models with Deep Research Capability Paper • 2504.21776 • Published Apr 30 • 57
DeepCritic: Deliberate Critique with Large Language Models Paper • 2505.00662 • Published May 1 • 53
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published May 12 • 81
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Paper • 2505.09343 • Published May 14 • 65
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published May 18 • 24
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning Paper • 2505.11896 • Published May 17 • 58
One RL to See Them All: Visual Triple Unified Reinforcement Learning Paper • 2505.18129 • Published May 23 • 60
RL with KL penalties is better viewed as Bayesian inference Paper • 2205.11275 • Published May 23, 2022 • 1
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization Paper • 2505.19000 • Published May 25 • 43
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space Paper • 2505.15778 • Published May 21 • 17
ZeroGUI: Automating Online GUI Learning at Zero Human Cost Paper • 2505.23762 • Published May 29 • 46
Table-R1: Inference-Time Scaling for Table Reasoning Paper • 2505.23621 • Published May 29 • 92
Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Paper • 2506.09250 • Published 29 days ago • 28
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published 9 days ago • 62