evals-for-every-language / results.json

Commit History

Only run tasks for which there is no result yet
2f9dee1

David Pomerenke commited on

Run on 40 languages, additional models
260c1a3

David Pomerenke commited on

Run evals
b0c61ed

David Pomerenke commited on

Run on 15 languages
f8a3dad

David Pomerenke commited on

Add model history plot
f52ec6e

David Pomerenke commited on

Implement MMLU task
a683732

David Pomerenke commited on

Add Global MMLU benchmark
ce2acb0

David Pomerenke commited on

Translation both from and to
731eddd

David Pomerenke commited on

Add OpenRouter metadata to models
9002fc2

David Pomerenke commited on

Run on 100 languages, adjust display
8274634

David Pomerenke commited on

Add Dockerfile
4d13673

David Pomerenke commited on

Language selection checkboxes & filtering in backend
d91b022

David Pomerenke commited on

Basic backend setup with FastApi but without actual filtering
2c21cf7

David Pomerenke commited on

spBLEU tokenizer, run on more languages
eaf2d97

David Pomerenke commited on

Better map tooltip
92b2164

David Pomerenke commited on

Process data for country map
723f963

David Pomerenke commited on

Autonymns and cooler dataset search display
33469f2

David Pomerenke commited on

More models
c5278dd

David Pomerenke commited on

Basic language table
d1a7111

David Pomerenke commited on

Refactor eval code into files
da6e1bc

David Pomerenke commited on

Model table using React
ecf4195

David Pomerenke commited on

Better results format (flatten + aggregate 3x), push results to hub
7a9c651

David Pomerenke commited on

Run on 50 languages
80a0827

David Pomerenke commited on

Rerun
0638620

David Pomerenke commited on

Separate overall scores for T2T / S2T
e9a19be

David Pomerenke commited on

Put all languages into results.json, replace pyglottolog
040dc35

David Pomerenke commited on

Add ASR ChrF scores
4973af4

David Pomerenke commited on

More evals
8633921

David Pomerenke commited on

Better separation of ttt/stt in results format
e223525

David Pomerenke commited on

Evaluate transcription
3d9cde9

David Pomerenke commited on

Basic FLEURS transcription setup
1ab3999

David Pomerenke commited on

Add language families
08735bb

David Pomerenke commited on

Metrics selector & refactoring
4f572a5

David Pomerenke commited on

Add masked language modeling (MLM) task
e92634d

David Pomerenke commited on

For classification use number + few-shot
1b634f3

David Pomerenke commited on

Show classification and overall score in app
1167b2d

David Pomerenke commited on

Classification evaluation
7fc657e

David Pomerenke commited on

Discuss translation metric biases and add chrF scores
086a421

David Pomerenke commited on

Run on all languages
edcfb8f

David Pomerenke commited on

Make a map
29c8ef6

David Pomerenke commited on

Parallelize everything, select most populous script
56081d8

David Pomerenke commited on

Add links to add CommonVoice recordings
8190782

David Pomerenke commited on

Use langcodes for language matching
d5fc8b3

David Pomerenke commited on

Add CommonVoice stats
8beab26

David Pomerenke commited on

Nice tables and plots
a65282b

David Pomerenke commited on

Newer models, run on 20 languages
175993f

David Pomerenke commited on

Basic Gradio setup
63202a2

David Pomerenke commited on

Basic Observable Framework setup
698d104

David Pomerenke commited on

Don't translate a langauge to itself
07dcc45

David Pomerenke commited on

Display all languages and translate from multiple languages
6b6f157

David Pomerenke commited on