Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
davidberenstein1957 's Collections
Smol but mighty
Useful Spaces
LLM evals and benchmark datasets
Synthetic Data Papers
Dataset Viber annotators
Cool and fun Spaces
Model Leaderboards
Useful models
Useful datasets
Follow The Money

LLM evals and benchmark datasets

updated Jan 22
Upvote
2

  • allenai/reward-bench

    Viewer • Updated Sep 9, 2024 • 8.11k • 7.43k • 94

  • openai/openai_humaneval

    Viewer • Updated Jan 4, 2024 • 164 • 72.2k • 315

  • google/IFEval

    Viewer • Updated Aug 14, 2024 • 541 • 22.1k • 66

  • allenai/ai2_arc

    Viewer • Updated Dec 21, 2023 • 7.79k • 326k • 192

  • allenai/winogrande

    Updated Jan 18, 2024 • 282k • 61

  • TIGER-Lab/MMLU-Pro

    Viewer • Updated Apr 6 • 12.1k • 49k • 351

  • cais/mmlu

    Viewer • Updated Mar 8, 2024 • 231k • 125k • 462

  • truthfulqa/truthful_qa

    Viewer • Updated Jan 4, 2024 • 1.63k • 38.3k • 243

  • openai/gsm8k

    Viewer • Updated Jan 4, 2024 • 17.6k • 543k • 720

  • Rowan/hellaswag

    Viewer • Updated Sep 28, 2023 • 60k • 296k • 119

  • tatsu-lab/alpaca_eval

    Updated Aug 16, 2024 • 32.8k • 54

  • HuggingFaceH4/mt_bench_prompts

    Viewer • Updated Jul 3, 2023 • 80 • 464 • 17

  • nvidia/ChatRAG-Bench

    Viewer • Updated May 24, 2024 • 34.6k • 2.85k • 110

  • rungalileo/ragbench

    Viewer • Updated Jun 11, 2024 • 95.4k • 3.68k • 49
Upvote
2
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs