evals (PT vs IT)

#30
by erichartford - opened

Hello,

The evals in the model card are the "PT" version, but this is the "IT" version

image.png

Presumably the "IT" version will have better scores than the "PT" version right?

Do you have the scores for the "IT" version to publish here?

Maybe it's here, page23 of the report https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf

however, it's hard for me to reproduce the scores (i.e., gsm8k, humaneval, mbpp) with lm-evaluation-harness, and I don't know where is the gap :(

image.png

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment