leaderboards - a Praise2112 Collection

Praise2112 's Collections

pdf-to-markdown

tts

t5

sae

o1-like-cot-data

zeroshot classificaiton

question-answering

SLMs

medical_datasets

transformer variants

instruction_generator

vlms

encoders-embedding-models

text_segmentation

ocr

leaderboards

updated Jan 25

Running

4.4k

4.4k

Chatbot Arena Leaderboard

🏆

Generate and display chatbot performance leaderboard
Running on CPU Upgrade

13.1k

13.1k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
Running on CPU Upgrade

5.68k

5.68k

MTEB Leaderboard

🥇

Embedding Leaderboard
Running on CPU Upgrade

831

831

Open ASR Leaderboard

🏆

Request evaluation for new speech models
Running

497

497

LLM-Perf Leaderboard

🏆

Explore LLM performance across hardware
Running

1.31k

1.31k

Big Code Models Leaderboard

📈

Submit code models for evaluation on benchmarks
Runtime error

78

78

Human & GPT-4 Evaluation of LLMs Leaderboard

👩
Running

442

442

Can Ai Code Results

🏆

Can AI Code? An LLM leaderboard inclquantized models.
Runtime error

140

140

Hallucinations Leaderboard

🔥

View and submit LLM evaluations
Runtime error

105

105

Enterprise Scenarios Leaderboard

🥇
Running on CPU Upgrade

92

92

LLM Safety Leaderboard

🥇

View and submit machine learning model evaluations
Running

547

547

Vision Arena (Testing VLMs side-by-side)

🖼

Analyze images to detect and label objects
Running

66

66

CyberSecEvalTest

📈

Evaluate LLM cybersecurity risks
Running

322

322

LLM Performance Leaderboard

🐨

View LLM Performance Leaderboard
Running on CPU Upgrade

70

70

AIR-Bench Leaderboard

🥇

Explore benchmark results for QA and long doc models
Running on CPU Upgrade

764

764

Open VLM Leaderboard

🌎

VLMEvalKit Evaluation Results Collection
Running

368

368

Reward Bench Leaderboard

📐

Explore and analyze RewardBench leaderboard data
Running

206

206

BigCodeBench Leaderboard

🥇

Explore and analyze code evaluation data
Running

10

10

MJ Bench Leaderboard

🥇

Display and filter multimodal model leaderboard results
Running

107

107

MTEB Arena

⚔

Display a machine translation evaluation interface
Runtime error

153

153

Open LLM Progress Tracker

🔬

Visualize Open vs. Proprietary LLM Progress
Running

101

101

Judge Arena

💻

Vote on AI responses to rank models
Running on CPU Upgrade

20

20

Leaderboard 2 Demo

📉

Demo of the new, massively multilingual leaderboard