Open LLM Leaderboard
Track, rank and evaluate open LLMs and chatbots
Track, rank and evaluate open LLMs and chatbots
Embedding Leaderboard
Explore LLM performance across hardware
Request evaluation for a speech model
Search and submit code models for evaluation
Display chatbot leaderboard and stats
Request model evaluation on COCO val 2017 dataset
Display ToolBench model performance results
Display a web page
Browse and compare AI model evaluations
View and submit LLM evaluations
Submit model evaluation and view leaderboard
Explore energy consumption of GenAI models
Explore and compare LLM models through a leaderboard
Upload and analyze video model evaluation data
Run a Streamlit web app
Evaluate LLM cybersecurity risks
Search for model performance across languages and benchmarks
Browse and filter leaderboard of language models
VLMEvalKit Evaluation Results Collection
Display and filter reward model evaluation data
Jailbreak the LLM and privacy guardrails
Filter data for contamination in datasets or models
Track, rank and evaluate open Arabic LLMs and chatbots
Explore and compare QA and long doc benchmarks
Submit and evaluate model results for the MM-AAD leaderboard
Explore and analyze code evaluation data
Evaluate open LLMs in the languages of LATAM and Spain.