Benchmarks - a hppdqdq Collection

hppdqdq 's Collections

Benchmarks

updated Jan 13

Running on CPU Upgrade

223

223

MMLU-Pro Leaderboard

🥇

More advanced and challenging multi-task evaluation
Running

51

51

Stick To Your Role! Leaderboard

🎭

Benchmarking LLMs on the stability of simulated populations
Running

53

53

ZeroEval Leaderboard

📊

Embed and use ZeroEval for evaluation tasks
Running

26

26

Decentralized Arena Leaderboard

🥇

Display model leaderboard evaluations
Runtime error

423

423

Open Medical-LLM Leaderboard

🥇

Browse and submit LLM evaluations
Running

121

121

Open VLM Video Leaderboard

🌎

VLMEvalKit Eval Results in video understanding benchmark
Running on CPU Upgrade

13.5k

13.5k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
Running

427

427

TTS Spaces Arena

🤗

Blind vote on HF TTS models!