Agent
updated
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle
Grandmaster Level
Paper
• 2411.03562
• Published • 69
Training Language Models for Social Deduction with Multi-Agent
Reinforcement Learning
Paper
• 2502.06060
• Published • 38
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
• 2502.14499
• Published • 195
SurveyX: Academic Survey Automation via Large Language Models
Paper
• 2502.14776
• Published • 100
Why Do Multi-Agent LLM Systems Fail?
Paper
• 2503.13657
• Published • 48
Scaling Test-time Compute for LLM Agents
Paper
• 2506.12928
• Published • 63
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
Paper
• 2507.08616
• Published • 15
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning
Systems in LLMs
Paper
• 2507.09477
• Published • 88
Agentic Reinforced Policy Optimization
Paper
• 2507.19849
• Published • 160
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper
• 2508.03680
• Published • 138
Efficient Agents: Building Effective Agents While Reducing Cost
Paper
• 2508.02694
• Published • 86
WideSearch: Benchmarking Agentic Broad Info-Seeking
Paper
• 2508.07999
• Published • 111
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper
• 2508.05748
• Published • 142
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent
Distillation and Agentic RL
Paper
• 2508.13167
• Published • 129
Provable Benefits of In-Tool Learning for Large Language Models
Paper
• 2508.20755
• Published • 11
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
• 2509.02547
• Published • 235
GEM: A Gym for Agentic LLMs
Paper
• 2510.01051
• Published • 91
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
Paper
• 2509.26354
• Published • 18
In-the-Flow Agentic System Optimization for Effective Planning and Tool
Use
Paper
• 2510.05592
• Published • 109
Multi-Agent Tool-Integrated Policy Optimization
Paper
• 2510.04678
• Published • 31
Don't Just Fine-tune the Agent, Tune the Environment
Paper
• 2510.10197
• Published • 30
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Paper
• 2510.09577
• Published • 8
Agentic Entropy-Balanced Policy Optimization
Paper
• 2510.14545
• Published • 107
Search Self-play: Pushing the Frontier of Agent Capability without
Supervision
Paper
• 2510.18821
• Published • 19
AgentFold: Long-Horizon Web Agents with Proactive Context Management
Paper
• 2510.24699
• Published • 72
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
Paper
• 2511.11373
• Published • 14
Latent Collaboration in Multi-Agent Systems
Paper
• 2511.20639
• Published • 126
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
Paper
• 2511.21678
• Published • 12
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Paper
• 2512.04324
• Published • 157
DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems
Paper
• 2512.06749
• Published • 28
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
Paper
• 2512.17008
• Published • 11
Nested Browser-Use Learning for Agentic Information Seeking
Paper
• 2512.23647
• Published • 19
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards
Paper
• 2601.06021
• Published • 47
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents
Paper
• 2601.07264
• Published • 24
MAXS: Meta-Adaptive Exploration with LLM Agents
Paper
• 2601.09259
• Published • 96
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development
Paper
• 2601.11077
• Published • 66
LLM-in-Sandbox Elicits General Agentic Intelligence
Paper
• 2601.16206
• Published • 86
Behavior Knowledge Merge in Reinforced Agentic Models
Paper
• 2601.13572
• Published • 27
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience
Paper
• 2601.15876
• Published • 92
DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents
Paper
• 2601.20975
• Published • 10
AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration
Paper
• 2602.03786
• Published • 89
LatentMem: Customizing Latent Memory for Multi-Agent Systems
Paper
• 2602.03036
• Published • 14
Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems
Paper
• 2602.08847
• Published • 28
Multi-agent cooperation through in-context co-player inference
Paper
• 2602.16301
• Published • 24
Towards a Science of AI Agent Reliability
Paper
• 2602.16666
• Published • 15
ResearchGym: Evaluating Language Model Agents on Real-World AI Research
Paper
• 2602.15112
• Published • 21
Discovering Multiagent Learning Algorithms with Large Language Models
Paper
• 2602.16928
• Published • 16
PyVision-RL: Forging Open Agentic Vision Models via RL
Paper
• 2602.20739
• Published • 31
Paper
• 2603.01896
• Published • 9
Heterogeneous Agent Collaborative Reinforcement Learning
Paper
• 2603.02604
• Published • 185