ai-progress-charts / arc_agi_semi_private_eval_leaderboard.jsonl
kaizuberbuehler's picture
Add new benchmarks; Several improvements
afb8d0c
raw
history blame contribute delete
260 Bytes
{"model": "o3", "score": 75.7}
{"model": "o1-2024-12-17", "score": 32}
{"model": "o1-preview-2024-09-12", "score": 18}
{"model": "claude-3-5-sonnet-20240620", "score": 14}
{"model": "gpt-4o-2024-05-13", "score": 5}
{"model": "gemini-1.5-pro-001", "score": 4.5}