WildBench - a allenai Collection

allenai 's Collections

IFBench

OLMo 2

olmOCR

OLMoE (January 2025)

PixMo

Tulu 3 Datasets

Molmo

OLMoE (November 2024)

Tulu V2.5 Suite

Paloma

SciRIFF

AI2 Safety Toolkit

Zebra Logic Bench

OLMo 2 Preview Post-trained Models

ACE

WildBench

updated Apr 30

Running

223

223

AI2 WildBench Leaderboard (V2)

🦁

Display and explore model leaderboards and chat history

Note The leaderboard for visualizing the results and collecting human feedback.
allenai/WildBench

Viewer • Updated Mar 4 • 2.3k • 1.5k • 35

Note Examples for evaluating LLMs.
allenai/WildBench-V2-Model-Outputs

Viewer • Updated Aug 1, 2024 • 62.5k • 684 • 2

Note The model outputs for verified LLMs on the leaderboard.
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Paper • 2406.04770 • Published Jun 7, 2024 • 31