Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning Paper • 2606.24428 • Published 9 days ago • 52
Skill-MAS: Evolving Meta-Skill for Automatic Multi-Agent Systems Paper • 2606.18837 • Published 15 days ago • 57
Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts Paper • 2606.05922 • Published 28 days ago • 69
Benchmarking AI Agents for Addressing Scientific Challenges Across Scales Paper • 2606.12736 • Published 22 days ago • 5
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 16 days ago • 208
Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance Paper • 2606.19195 • Published 15 days ago • 139
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 24 days ago • 104
Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding? Paper • 2606.08063 • Published 26 days ago • 82
From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning Paper • 2606.07190 • Published 27 days ago • 35
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs Paper • 2605.30611 • Published May 28 • 250