-
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 25 -
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper • 2503.04808 • Published • 17 -
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Paper • 2503.12937 • Published • 27
Collections
Discover the best community collections!
Collections including paper arxiv:2503.12937
-
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Paper • 2502.14768 • Published • 47 -
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Paper • 2502.12853 • Published • 29 -
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 17 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 46
-
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Paper • 2501.06186 • Published • 64 -
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper • 2501.01904 • Published • 33 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 112 -
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 73
-
ReLearn: Unlearning via Learning for Large Language Models
Paper • 2502.11190 • Published • 29 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 150 -
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Paper • 2502.11357 • Published • 10 -
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Paper • 2503.12797 • Published • 29
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 59 -
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization
Paper • 2412.18279 • Published -
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
Paper • 2501.10799 • Published • 15 -
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper • 2501.19324 • Published • 38
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 59 -
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 38 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 148 -
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 61
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 27 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 28 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 117 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling
Paper • 2412.14860 • Published • 2 -
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 35 -
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning
Paper • 2412.15797 • Published • 18
-
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Paper • 2503.12937 • Published • 27 -
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Paper • 2503.12605 • Published • 30 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 54 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 114
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 37 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 51 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 67 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 49