SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published 3 days ago • 36
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective Paper • 2506.17930 • Published 12 days ago • 17
LettinGo: Explore User Profile Generation for Recommendation System Paper • 2506.18309 • Published 11 days ago • 10
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Paper • 2506.18841 • Published 10 days ago • 54
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published 18 days ago • 249
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Paper • 2506.10521 • Published 22 days ago • 65
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published 21 days ago • 61
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning Paper • 2506.09513 • Published 23 days ago • 95
SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks Paper • 2506.10954 • Published 21 days ago • 51
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18 • 129
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics Paper • 2506.00070 • Published May 29 • 28
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Paper • 2506.01713 • Published Jun 2 • 46
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation Paper • 2506.02397 • Published about 1 month ago • 36
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Paper • 2506.04207 • Published 29 days ago • 46
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis Paper • 2506.02096 • Published Jun 2 • 51
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30 • 95
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 165