AlexCuadron/SWE-Bench-Verified-O1-native-tool-calling-reasoning-high-results Viewer • Updated Jan 14 • 500 • 525 • 2
Running on CPU Upgrade 13.4k 13.4k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots