Reasoning Models - a Stalin16 Collection

Stalin16 's Collections

Edu

Agents

Model Evaluation

Reasoning Models

Data and other things

Gen AI Diffusion

Reasoning Models

updated 24 days ago

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 62
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

Paper • 2502.07374 • Published Feb 11 • 41
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10 • 154
S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published Feb 20 • 63
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 114
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

Paper • 2503.05379 • Published Mar 7 • 39
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model

Paper • 2503.05132 • Published Mar 7 • 58
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 47
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Paper • 2503.07365 • Published Mar 10 • 62
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 89
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Paper • 2503.12605 • Published Mar 16 • 36
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Paper • 2503.12937 • Published Mar 17 • 30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 139
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Paper • 2503.12797 • Published Mar 17 • 32
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging

Paper • 2503.20641 • Published Mar 26 • 9
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31 • 63
Effectively Controlling Reasoning Models through Thinking Intervention

Paper • 2503.24370 • Published Mar 31 • 20
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Paper • 2503.21614 • Published Mar 27 • 42
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1

Paper • 2503.24376 • Published Mar 31 • 39
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead

Paper • 2504.00294 • Published Mar 31 • 11
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 180
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions

Paper • 2506.07527 • Published Jun 9 • 3
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Paper • 2506.06941 • Published Jun 7 • 14
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 260
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

Paper • 2506.07976 • Published Jun 9 • 6
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

Paper • 2506.18896 • Published Jun 23 • 29
Kwai Keye-VL Technical Report

Paper • 2507.01949 • Published Jul 2 • 130
RLPR: Extrapolating RLVR to General Domains without Verifiers

Paper • 2506.18254 • Published Jun 23 • 32
Perception-Aware Policy Optimization for Multimodal Reasoning

Paper • 2507.06448 • Published Jul 8 • 47
Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2 • 106
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Paper • 2507.06261 • Published Jul 7 • 60
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

Paper • 2507.05255 • Published Jul 7 • 73
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

Paper • 2507.09477 • Published Jul 13 • 80
The Invisible Leash: Why RLVR May Not Escape Its Origin

Paper • 2507.14843 • Published Jul 20 • 84
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 294
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

Paper • 2507.15758 • Published Jul 21 • 34
THU-KEG/LongWriter-Zero-32B

Text Generation • 33B • Updated Jul 3 • 1.19k • • 107
MUR: Momentum Uncertainty guided Reasoning for Large Language Models

Paper • 2507.14958 • Published Jul 20 • 46
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 148
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published 28 days ago • 45
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

Paper • 2508.10433 • Published 25 days ago • 143