Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
gitlost-murali 's Collections
Agentic & Multi-turn Chat

Agentic & Multi-turn Chat

updated about 3 hours ago

Benchmarks & datasets for evaluating agents and multi-turn chat

Upvote
-

  • CodeACT: Code Adaptive Compute-efficient Tuning Framework for Code LLMs

    Paper • 2408.02193 • Published Aug 5, 2024 • 1

  • google/frames-benchmark

    Viewer • Updated Oct 15, 2024 • 824 • 4.5k • 216

  • gaia-benchmark/GAIA

    Updated Feb 13 • 9.41k • 389

  • callanwu/WebWalkerQA

    Viewer • Updated Jan 14 • 14.3k • 5.83k • 21

  • WebSailor: Navigating Super-human Reasoning for Web Agent

    Paper • 2507.02592 • Published 15 days ago • 94

  • Establishing Best Practices for Building Rigorous Agentic Benchmarks

    Paper • 2507.02825 • Published 15 days ago

  • promptfoo/CCP-sensitive-prompts

    Viewer • Updated Jan 28 • 1.36k • 325 • 48
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs