Commit History
Upload from GitHub Actions: Get more results, compute average based on all tasks
98c6811
verified
Upload from GitHub Actions: Translate MMLU and evaluate
4c5c136
verified
Upload from GitHub Actions: Correlation plot
b0aa389
verified
Upload from GitHub Actions: Evaluate on autotranslated GSM dataset
f3a09a2
verified
Upload from GitHub Actions: Add math benchmarks
549360a
verified
Upload from GitHub Actions: Use FLORES+ via Huggingface
913253a
verified
Upload from GitHub Actions: Fix vibecoding
75010c2
verified
Pass through kwargs
5fa433f
David Pomerenke
commited on
Fix dataset loading
c990cb9
David Pomerenke
commited on
Fix import paths
c567aee
David Pomerenke
commited on
added download function and edited INFO
f529b7b
Only run tasks for which there is no result yet
2f9dee1
David Pomerenke
commited on
Run on 40 languages, additional models
260c1a3
David Pomerenke
commited on
Move functions for sharing them
55406ba
David Pomerenke
commited on
Implement MMLU task
a683732
David Pomerenke
commited on
MMLU data loader for 3 parallel datasets
47170a5
David Pomerenke
commited on
Analyze MMLU datasets
031925d
David Pomerenke
commited on
Refactor eval code into files
da6e1bc
David Pomerenke
commited on