Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Paper • 2503.09516 • Published Mar 12 • 35
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30 • 96
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23 • 89
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 133
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning Paper • 2505.24298 • Published May 30 • 27
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning Paper • 2505.20355 • Published May 26 • 36
Interleaved Reasoning for Large Language Models via Reinforcement Learning Paper • 2505.19640 • Published May 26 • 13
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow Paper • 2505.17399 • Published May 23 • 14
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles Paper • 2505.19914 • Published May 26 • 44
One RL to See Them All: Visual Triple Unified Reinforcement Learning Paper • 2505.18129 • Published May 23 • 60
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models Paper • 2505.14810 • Published May 20 • 63
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning Paper • 2505.16410 • Published May 22 • 57
Optimizing Anytime Reasoning via Budget Relative Policy Optimization Paper • 2505.13438 • Published May 19 • 36
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published May 18 • 24
Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models Paper • 2506.01320 • Published Jun 2 • 16
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics Paper • 2506.00070 • Published May 29 • 28
A Controllable Examination for Long-Context Language Models Paper • 2506.02921 • Published Jun 3 • 33
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs Paper • 2506.01674 • Published Jun 2 • 28
CodeContests+: High-Quality Test Case Generation for Competitive Programming Paper • 2506.05817 • Published Jun 6 • 9
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Paper • 2506.01111 • Published Jun 1 • 30
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior Paper • 2506.08012 • Published Jun 9 • 7
Dreamland: Controllable World Creation with Simulator and Generative Models Paper • 2506.08006 • Published Jun 9 • 7
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Paper • 2506.06444 • Published Jun 6 • 74
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation Paper • 2506.07530 • Published Jun 9 • 20
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better Paper • 2506.09040 • Published Jun 10 • 35
Through the Valley: Path to Effective Long CoT Training for Small Language Models Paper • 2506.07712 • Published Jun 9 • 18
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework Paper • 2506.02454 • Published Jun 3 • 5
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation Paper • 2506.04614 • Published Jun 5 • 16
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning Paper • 2506.06205 • Published Jun 6 • 29
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team Paper • 2506.14234 • Published Jun 17 • 40
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers Paper • 2506.14702 • Published Jun 17 • 4
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation Paper • 2506.06962 • Published Jun 8 • 29
LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning Paper • 2506.10082 • Published Jun 11 • 9
General-Reasoner: Advancing LLM Reasoning Across All Domains Paper • 2505.14652 • Published May 20 • 23
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation Paper • 2506.17202 • Published Jun 20 • 10
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Paper • 2506.18896 • Published Jun 23 • 28
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Paper • 2506.16035 • Published Jun 19 • 86
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion Paper • 2507.02813 • Published Jul 3 • 59
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model Paper • 2507.01953 • Published Jul 2 • 19
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective Paper • 2506.17930 • Published Jun 22 • 19
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published Jun 30 • 46
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky Paper • 2507.03336 • Published Jul 4 • 5
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper • 2507.06181 • Published 28 days ago • 40
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs Paper • 2507.05687 • Published 28 days ago • 26
Coding Triangle: How Does Large Language Model Understand Code? Paper • 2507.06138 • Published 28 days ago • 20
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning Paper • 2507.05920 • Published 28 days ago • 11
RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs Paper • 2507.03253 • Published Jul 4 • 18
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Paper • 2507.07996 • Published 26 days ago • 31
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective Paper • 2507.08801 • Published 25 days ago • 29
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published 19 days ago • 230
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization Paper • 2507.15061 • Published 16 days ago • 47
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Paper • 2507.12841 • Published 19 days ago • 39
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published 14 days ago • 113
MUR: Momentum Uncertainty guided Reasoning for Large Language Models Paper • 2507.14958 • Published 16 days ago • 45
Does More Inference-Time Compute Really Help Robustness? Paper • 2507.15974 • Published 15 days ago • 6
RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback Paper • 2507.15024 • Published 16 days ago • 13
ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting Paper • 2507.15454 • Published 15 days ago • 7
Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models Paper • 2507.14241 • Published 19 days ago • 16
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation Paper • 2507.18537 • Published 12 days ago • 17
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos Paper • 2507.15597 • Published 15 days ago • 33
A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning Paper • 2507.14295 • Published 18 days ago • 13
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction Paper • 2507.15852 • Published 15 days ago • 37
FLEXITOKENS: Flexible Tokenization for Evolving Language Models Paper • 2507.12720 • Published 19 days ago • 8
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization Paper • 2507.12142 • Published 20 days ago • 36
Replacing thinking with tool usage enables reasoning in small language models Paper • 2507.05065 • Published 29 days ago • 14
Lizard: An Efficient Linearization Framework for Large Language Models Paper • 2507.09025 • Published 24 days ago • 16
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment Paper • 2507.20984 • Published 8 days ago • 51
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published 11 days ago • 28
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities Paper • 2507.19766 • Published 10 days ago • 13
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning Paper • 2507.22607 • Published 6 days ago • 37
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models Paper • 2508.00819 • Published 4 days ago • 44