Open LLM Leaderboard
Track, rank and evaluate open LLMs and chatbots
Track, rank and evaluate open LLMs and chatbots
Embedding Leaderboard
Explore LLM performance across hardware
Request evaluation of a speech recognition model
Submit code models for evaluation on benchmarks
Display chatbot leaderboard and statistics
Request model evaluation on COCO val 2017 dataset
Display ToolBench model performance results
Display a web page
Browse and compare AI model evaluations
View and submit LLM evaluations
Submit model evaluation and view leaderboard
Explore GenAI model efficiency on ML.ENERGY leaderboard
Explore and compare LLM models through a leaderboard
Upload and evaluate video models
Run a Streamlit web app
Evaluate LLM cybersecurity risks
Search for model performance across languages and benchmarks
Browse and filter leaderboard of language models
VLMEvalKit Evaluation Results Collection
Explore and analyze RewardBench leaderboard data
Jailbreak the LLM and privacy guardrails
Filter data for contamination in datasets or models
Track, rank and evaluate open Arabic LLMs and chatbots
Explore benchmark results for QA and long doc models
Submit and evaluate model results for the MM-AAD leaderboard
Explore and analyze code evaluation data
Evaluate open LLMs in the languages of LATAM and Spain.