τ^2-Bench: Evaluating Conversational Agents in a Dual-Control Environment Paper • 2506.07982 • Published 5 days ago • 4
$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains Paper • 2406.12045 • Published Jun 17, 2024 • 8