NeMo
English
nvidia
llama3.1

Cannot verify benchmark results

#4
by Lexski - opened

On the model card it says the model gets

AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98

and

Llama-3.1-Nemotron-70B-Instruct performs best on Arena Hard, AlpacaEval 2 LC (verified tab) and MT Bench (GPT-4-Turbo)

I tried following the links, but I cannot verify the results. The AlpacaEval 2.0 link indeed shows the leaderboard, but this Nemotron model does not appear on the leaderboard. The MT-Bench link takes me to a GitHub PR which doesn't mention GPT-4-Turbo or the Nemotron model.

Български разбираш ли

NVIDIA org

Those benchmarks were run internally so it's normal that you can't find those numbers online:

  • The AlpacaEval 2.0 link is for people to compare to the official leaderboard
  • The MT-Bench link is for people who may want to run this benchmark themselves, since it requires the changes from this PR
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment