evals (PT vs IT)
#30
by
erichartford
- opened
Maybe it's here, page23 of the report https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
however, it's hard for me to reproduce the scores (i.e., gsm8k, humaneval, mbpp) with lm-evaluation-harness, and I don't know where is the gap :(