Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models Paper • 2508.10751 • Published Aug 14 • 27
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers Paper • 2508.14704 • Published Aug 20 • 42
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published Aug 22 • 149
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications Paper • 2508.16279 • Published Aug 22 • 51
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation Paper • 2410.20774 • Published Oct 28, 2024
Provable Benefits of In-Tool Learning for Large Language Models Paper • 2508.20755 • Published Aug 28 • 11
Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents Paper • 2509.06917 • Published 30 days ago • 38