Spaces:

gaia-benchmark
/

leaderboard

Running on CPU Upgrade

Possible future contamination problem

by supercharge19 - opened Jan 30

Jan 30

If there are only a handful of example (questions) which can easily be answered by humans and you have released that to public what is stopping ranking seekers to contaminate their models by manually writing answers to them and then training models over that data?

gregmialz

GAIA org Feb 1

Nothing.
However, manually answering the questions is (i) conceptually easy but also extremely tedious (ii) difficult to hide (we ask model owners to provide reasoning trace, scores might be suspicious etc.) (iii) not robust since we plan to renew the test set in case of contamination

supercharge19

Feb 1

I was thinking that you would have a higher number of questions and while mention that you have only 300 and even "leak" questions but strictly guard other questions and answers and not even mention how many are there.

clefourrier

GAIA org Feb 5

•

edited Feb 5

In complement to @gregmialz 's very good answer, we actually need people to know what the questions from the test set are, so they can use their models on them and give us their answers :)

clefourrier changed discussion status to closed Feb 5

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment