Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
tmarechaux 's Collections
Theorical
LLMs
LLM for code
IR
LLM Eval

LLM Eval

updated Jun 21, 2024
Upvote
-

  • Levels of AGI for Operationalizing Progress on the Path to AGI

    Paper • 2311.02462 • Published Nov 4, 2023 • 37

  • Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Paper • 2206.04615 • Published Jun 9, 2022 • 5

  • A Survey on Evaluation of Large Language Models

    Paper • 2307.03109 • Published Jul 6, 2023 • 42

  • Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

    Paper • 2306.13651 • Published Jun 23, 2023 • 15

  • GAIA: a benchmark for General AI Assistants

    Paper • 2311.12983 • Published Nov 21, 2023 • 222

  • Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

    Paper • 2403.04132 • Published Mar 7, 2024 • 41

  • τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

    Paper • 2406.12045 • Published Jun 17, 2024 • 9
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs