Spaces:
Running
on
CPU Upgrade
leaderboard should be more curated
why is there no meta-llama/Meta-Llama-3.1-405B
why is there no mistralai/Mistral-Large-Instruct-2407
these are incredibly important open weights models that ought to be in the leaderboard on day 1.
Hi @ehartford ,
First, let me correct the title of this discussion to a more relevant to your question.
We are aware that we don't have meta-llama/Meta-Llama-3.1-405B
and mistralai/Mistral-Large-Instruct-2407
currently, as well as other large models that requires evaluation in a multi-node mode.
We are working on adding Meta-Llama-3.1-405B
to the Leaderboard, but in the meantime, if you want us to evaluate any other large model, please, open a discussion about it and wait for the community's reaction. If there's enough interest from the community, we're open to manually evaluating models that require more than one node! For example, we evaluated alpindale/WizardLM-2-8x22B
thanks to this discussion
I meant the title not your interpretation.
The examples I gave happened to be large.
But the leaderboard should be curated.
405b should have been on the leaderboard on the day it was released.
Goes for any model released by Mistral Facebook Google etc, big or small
"we hear your suggestion and disregard it" is a fine response
Thank you for your feedback. We understand your desire for a more curated approach focusing on models from key organisations like Meta, Mistral, and Google. We strive to add significant models as quickly as possible, but immediate addition on release day isn't always possible, especially for large models requiring extensive computational resources.
We balance various factors including community interest, resource availability, and technical complexity when prioritising model evaluations.
If there are specific models you believe should be prioritised, we encourage you to open discussions about them. Community feedback will help us to measure interest and allocate our resources effectively.
We appreciate your engagement with the Leaderboard and are committed to making it as useful and comprehensive as possible within our constraints.
Ok well this is a business vision and prioritization issue, not a resources issue, you certainly have the resources to run evals on the 1-2 top open weights models that come out each month. (Especially when they legit compete with OpenAI and Claude). If that's not important enough to warrant prioritized evaluation then the community frankly needs a new leaderboard.
You don't need to write me back I'm just venting at this point.
I will go ahead and create a new leaderboard that implements exactly yours but actually evals all the new hot models as they come out rather than months later. @clem
@ehartford The Leaderboard is curated:
You're just not the curator,
We are,
Regards,
- Everybody voting on the Leaderboard, also known as: The Community
(including me)
So save yourself some compute for another leaderboard, the community can decide on their own.
great comeback, that'll teach me. Cheers.