Chatbot Arena Leaderboard
Display chatbot leaderboard statistics
Display chatbot leaderboard statistics
Track, rank and evaluate open LLMs and chatbots
Select benchmarks and languages for text embeddings evaluation
Explore hardware performance for language models
Request evaluation for speech models
Submit code models for evaluation on benchmarks
View and submit LLM evaluations
View and submit machine learning model evaluations
Display and explore model leaderboards and chat history
Request model evaluation on COCO val 2017 dataset
Display model benchmark results
Run a Streamlit web app
VLMEvalKit Evaluation Results Collection
Analyze images to detect and label objects
Browse and submit LLM evaluations
Track, rank and evaluate open LLMs' CoT quality
Submit and evaluate model results for the MM-AAD leaderboard
Explore and analyze code evaluation data
Display and filter multimodal model leaderboard results
Explore and analyze RewardBench leaderboard data
Ranking of LLMs for agentic tasks