CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation Paper • 2305.14318 • Published May 23, 2023
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents Paper • 2402.09205 • Published Feb 14, 2024
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance Paper • 2410.12361 • Published Oct 16, 2024
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents Paper • 2502.09560 • Published Feb 13 • 36
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published Mar 3 • 27
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset Paper • 2504.03612 • Published Apr 4 • 2
The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination Paper • 2502.16143 • Published Feb 22 • 1
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published Feb 25 • 74
Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning Paper • 2410.03103 • Published Oct 4, 2024 • 8
SciCode: A Research Coding Benchmark Curated by Scientists Paper • 2407.13168 • Published Jul 18, 2024 • 14
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published Jul 1, 2024 • 63
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts Paper • 2404.15247 • Published Apr 23, 2024 • 3