EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL Paper • 2605.18703 • Published 4 days ago • 44
OpenComputer: Verifiable Software Worlds for Computer-Use Agents Paper • 2605.19769 • Published 3 days ago • 54
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration Paper • 2605.20025 • Published 3 days ago • 59
On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists Paper • 2605.20668 • Published 1 day ago • 9
FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale Paper • 2605.14445 • Published 8 days ago • 20
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published 11 days ago • 45
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published 9 days ago • 154
CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models Paper • 2605.08735 • Published 13 days ago • 68
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published 13 days ago • 78
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published 14 days ago • 66
AcademiClaw: When Students Set Challenges for AI Agents Paper • 2605.02661 • Published 18 days ago • 16
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published 19 days ago • 161
AI Co-Mathematician: Accelerating Mathematicians with Agentic AI Paper • 2605.06651 • Published 15 days ago • 15
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key Paper • 2605.06638 • Published 15 days ago • 14
SkillOS: Learning Skill Curation for Self-Evolving Agents Paper • 2605.06614 • Published 15 days ago • 45
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery Paper • 2604.25256 • Published 24 days ago • 29
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 25 days ago • 118