Running on CPU Upgrade 185 185 MMLU-Pro Leaderboard π₯ More advanced and challenging multi-task evaluation