MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3 • 43
LiveBench: A Challenging, Contamination-Free LLM Benchmark Paper • 2406.19314 • Published Jun 27 • 19