SLM-RAG-Arena / utils /leaderboard /arena_elo_leaderboard.csv
oliver-aizip's picture
simplify prompts to be less restrictive and update persistent leaderboard
345c0d1
raw
history blame contribute delete
752 Bytes
model,elo,wins,losses,ties,games_played,confidence_interval
icecream-3b,1640.0,11,0,0,12,226.3
IBM Granite-3.3-2b-instruct,1580.5,14,6,2,26,153.8
Cogito-v1-preview-llama-3b,1576.1,8,2,0,11,236.4
EXAONE-3.5-2.4B-instruct,1567.2,11,5,1,18,184.8
Phi-4-mini-instruct,1533.6,13,8,1,30,143.1
Qwen2.5-3b-Instruct,1502.4,7,8,0,29,145.6
Gemma-3-4b-it,1498.3,3,3,1,8,277.2
SmolLM2-1.7b-Instruct,1493.2,7,9,0,18,184.8
Gemma-2-2b-it,1491.9,6,7,0,23,163.5
Qwen3-4b,1483.2,4,6,1,13,217.4
OLMo-2-1B-Instruct,1471.5,4,7,0,12,226.3
Llama-3.2-1b-Instruct,1469.4,9,12,3,30,143.1
Qwen3-0.6b,1455.9,3,7,0,15,202.4
Qwen2.5-1.5b-Instruct,1455.1,7,11,3,29,145.6
Qwen3-1.7b,1454.8,2,6,2,11,236.4
Llama-3.2-3b-Instruct,1435.8,6,10,1,26,153.8
Gemma-3-1b-it,1421.5,2,8,1,15,202.4