Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1016

leaderboard should be more curated

#908

by ehartford - opened Aug 30

Discussion

ehartford

Aug 30

why is there no meta-llama/Meta-Llama-3.1-405B

why is there no mistralai/Mistral-Large-Instruct-2407

these are incredibly important open weights models that ought to be in the leaderboard on day 1.

alozowski changed discussion title from leaderboard should be more curated to the leaderboard is missing large models Aug 30

alozowski

Open LLM Leaderboard org Aug 30

Hi @ehartford ,

First, let me correct the title of this discussion to a more relevant to your question.

We are aware that we don't have meta-llama/Meta-Llama-3.1-405B and mistralai/Mistral-Large-Instruct-2407 currently, as well as other large models that requires evaluation in a multi-node mode.

We are working on adding Meta-Llama-3.1-405B to the Leaderboard, but in the meantime, if you want us to evaluate any other large model, please, open a discussion about it and wait for the community's reaction. If there's enough interest from the community, we're open to manually evaluating models that require more than one node! For example, we evaluated alpindale/WizardLM-2-8x22B thanks to this discussion

alozowski changed discussion status to closed Aug 30

ehartford changed discussion title from the leaderboard is missing large models to leaderboard should be more curated Aug 30

ehartford

Aug 30

•

edited Aug 30

I meant the title not your interpretation.

The examples I gave happened to be large.

But the leaderboard should be curated.

405b should have been on the leaderboard on the day it was released.

Goes for any model released by Mistral Facebook Google etc, big or small

ehartford

Aug 30

"we hear your suggestion and disregard it" is a fine response

alozowski

Open LLM Leaderboard org Aug 30

Thank you for your feedback. We understand your desire for a more curated approach focusing on models from key organisations like Meta, Mistral, and Google. We strive to add significant models as quickly as possible, but immediate addition on release day isn't always possible, especially for large models requiring extensive computational resources.

We balance various factors including community interest, resource availability, and technical complexity when prioritising model evaluations.

If there are specific models you believe should be prioritised, we encourage you to open discussions about them. Community feedback will help us to measure interest and allocate our resources effectively.

We appreciate your engagement with the Leaderboard and are committed to making it as useful and comprehensive as possible within our constraints.

ehartford

Aug 30

Ok well this is a business vision and prioritization issue, not a resources issue, you certainly have the resources to run evals on the 1-2 top open weights models that come out each month. (Especially when they legit compete with OpenAI and Claude). If that's not important enough to warrant prioritized evaluation then the community frankly needs a new leaderboard.

You don't need to write me back I'm just venting at this point.

I will go ahead and create a new leaderboard that implements exactly yours but actually evals all the new hot models as they come out rather than months later. @clem