leaderboards - a MoritzLaurer Collection

MoritzLaurer 's Collections

prompt-templates

Zeroshot Classifiers

other-interesting

code generation

leaderboards

updated Apr 2

Runtime error

4.55k

4.55k

Chatbot Arena Leaderboard

🏆

Display chatbot performance leaderboard
Running on CPU Upgrade

13.3k

13.3k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
Running on CPU Upgrade

6.12k

6.12k

MTEB Leaderboard

🥇

Embedding Leaderboard
Running on CPU Upgrade

986

986

Open ASR Leaderboard

🏆

Request evaluation for a speech model
Running

539

539

LLM-Perf Leaderboard

🏆

Explore LLM performance across hardware
Running

1.38k

1.38k

Big Code Models Leaderboard

📈

Search and submit code models for evaluation
Runtime error

78

78

Human & GPT-4 Evaluation of LLMs Leaderboard

👩
Running

445

445

Can Ai Code Results

🏆

Can AI Code? An LLM leaderboard inclquantized models.
Running on CPU Upgrade

143

143

Hallucinations Leaderboard

🔥

View and submit LLM evaluations
Runtime error

105

105

Enterprise Scenarios Leaderboard

🥇
Running on CPU Upgrade

93

93

LLM Safety Leaderboard

🥇

View and submit machine learning model evaluations
Running

551

551

Vision Arena (Testing VLMs side-by-side)

🖼

Analyze images to detect and label objects
Running

67

67

CyberSecEvalTest

📈

Evaluate LLM cybersecurity risks
Running

356

356

LLM Performance Leaderboard

🐨

View LLM performance rankings
Running on CPU Upgrade

73

73

AIR-Bench Leaderboard

🥇

Explore and compare QA and long doc benchmarks
Running on CPU Upgrade

837

837

Open VLM Leaderboard

🌎

VLMEvalKit Evaluation Results Collection
Running

389

389

Reward Bench Leaderboard

📐

Display and filter model evaluation results
Running

218

218

BigCodeBench Leaderboard

🥇

Explore and analyze code evaluation data
Running

10

10

MJ Bench Leaderboard

🥇

Display and filter multimodal model leaderboard results
Running

114

114

MTEB Arena

⚔

Display text-to-text translation interface
Runtime error

151

151

Open LLM Progress Tracker

🔬

Visualize Open vs. Proprietary LLM Progress
Running

105

105

Judge Arena

💻

Compare AI models by voting on their responses
Running

408

408

TTS Spaces Arena

🤗

Blind vote on HF TTS models!
Running

136

136

smolagents LLM leaderboard

🏆

A leaderboard for LLMs powering smolagents