VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications Paper • 2509.26490 • Published 12 days ago • 17
Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning Paper • 2505.16483 • Published May 22 • 10