GSM8K eval

#23
by entfane - opened

The results that we see for gsm8k are with 0-shot prompting or few-shot prompting?

Yandex org

BBH β€” 3-shot, HUMAN_EVAL and MPBB β€” 0-shot, all other benchmarks are 5-shot.
We took all the measurements at HF transformers.

Sign up or log in to comment