Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key Paper • 2605.06638 • Published 17 days ago • 14 • 5
Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces Paper • 2605.02801 • Published 20 days ago • 7 • 3
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning Paper • 2605.02913 • Published Apr 8 • 9 • 3
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models Paper • 2604.04707 • Published Apr 6 • 203 • 12
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings Paper • 2603.13594 • Published Mar 13 • 149 • 4
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data Paper • 2603.15594 • Published Mar 16 • 149 • 6
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 268 • 6
The Denario project: Deep knowledge AI agents for scientific discovery Paper • 2510.26887 • Published Oct 30, 2025 • 8 • 2
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats Paper • 2510.25602 • Published Oct 29, 2025 • 80 • 6
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model Paper • 2510.18855 • Published Oct 21, 2025 • 73 • 3
AgentFold: Long-Horizon Web Agents with Proactive Context Management Paper • 2510.24699 • Published Oct 28, 2025 • 72 • 4
Multi-Agent Evolve: LLM Self-Improve through Co-evolution Paper • 2510.23595 • Published Oct 27, 2025 • 13 • 2
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning Paper • 2510.15262 • Published Oct 17, 2025 • 6 • 3