Eval Leaderboards - a andrewrreed Collection

andrewrreed 's Collections

Hallucination Detection

Eval Leaderboards

Small, but mighty chat models

Eval Leaderboards

updated 9 days ago

Running

4.06k

4.06k

Chatbot Arena Leaderboard

🏆

Display chatbot leaderboard statistics
Running on CPU Upgrade

12.6k

12.6k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
Running on CPU Upgrade

4.86k

4.86k

MTEB Leaderboard

🥇

Select benchmarks and languages for text embeddings evaluation
Running

428

428

LLM-Perf Leaderboard

🏆

Explore hardware performance for language models
Running on CPU Upgrade

621

621

Open ASR Leaderboard

🏆

Request evaluation for speech models
Running

1.16k

1.16k

Big Code Models Leaderboard

📈

Submit code models for evaluation on benchmarks
Running on CPU Upgrade

131

131

Hallucinations Leaderboard

🔥

View and submit LLM evaluations
Runtime error

104

104

Enterprise Scenarios Leaderboard

🥇
Running on CPU Upgrade

89

89

LLM Safety Leaderboard

🥇

View and submit machine learning model evaluations
Running

222

222

AI2 WildBench Leaderboard (V2)

🦁

Display and explore model leaderboards and chat history
Running

155

155

Open Object Detection Leaderboard

🏆

Request model evaluation on COCO val 2017 dataset
Running

41

41

Redteaming Resistance Leaderboard

💻

Display model benchmark results
Runtime error

30

30

Contextual Leaderboard

🐨
Running

186

186

Yet Another LLM Leaderboard

🌖

Run a Streamlit web app
Running on CPU Upgrade

628

628

Open VLM Leaderboard

🌎

VLMEvalKit Evaluation Results Collection
Running

542

542

Vision Arena (Testing VLMs side-by-side)

🖼

Analyze images to detect and label objects
Configuration error

34

34

Leaderboard

🐠
Running on CPU Upgrade

344

344

Open Medical-LLM Leaderboard

🥇

Browse and submit LLM evaluations
Running on CPU Upgrade

50

50

Open CoT Leaderboard

🥇

Track, rank and evaluate open LLMs' CoT quality
Running

23

23

MM-UPD Leaderboard

🥇

Submit and evaluate model results for the MM-AAD leaderboard
Running

182

182

BigCodeBench Leaderboard

🥇

Explore and analyze code evaluation data
Running

10

10

MJ Bench Leaderboard

🥇

Display and filter multimodal model leaderboard results
Running

332

332

Reward Bench Leaderboard

📐

Explore and analyze RewardBench leaderboard data
Running on CPU Upgrade

192

192

Agent Leaderboard

💬

Ranking of LLMs for agentic tasks