Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
allenai 's Collections
OLMo 2
olmOCR
DataDecide
OLMoE (January 2025)
PixMo
Tulu 3 Models
Tulu 3 Datasets
Molmo
OLMoE (November 2024)
OLMo Suite
Tulu V2.5 Suite
Reward Bench
Paloma
Tulu V2 Suite
WildBench
SciRIFF
AI2 Safety Toolkit
Zebra Logic Bench
OLMo 2 Preview Post-trained Models
ACE

WildBench

updated 9 days ago
Upvote
6

  • Running
    223
    223

    AI2 WildBench Leaderboard (V2)

    🦁

    Display and explore model leaderboards and chat history

    Note The leaderboard for visualizing the results and collecting human feedback.


  • allenai/WildBench

    Viewer • Updated Mar 4 • 2.3k • 2.45k • 34

    Note Examples for evaluating LLMs.


  • allenai/WildBench-V2-Model-Outputs

    Viewer • Updated Aug 1, 2024 • 62.5k • 1.89k • 2

    Note The model outputs for verified LLMs on the leaderboard.


  • WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

    Paper • 2406.04770 • Published Jun 7, 2024 • 31
Upvote
6
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs