Open Sourcing Trillion-Parameter Models: Does It Really Matter?

#10
by kingabzpro - opened

Open sourcing a trillion-parameter language model may seem like a major milestone, but for most users, it changes little. The hardware and infrastructure needed to run, test, or serve such massive models are out of reach for over 99.9% of people and organizations. You need huge clusters of high-end GPUs, specialized networking, and enormous ongoing costs.
Fine-tuning is even less accessible—it typically requires almost double the resources needed just to run the model. In practice, open sourcing these giant models is symbolic, not practical, for nearly everyone.

you can run this on a few thousand dollars worth of consumer hardware

Open sourcing a trillion-parameter language model may seem like a major milestone, but for most users, it changes little. The hardware and infrastructure needed to run, test, or serve such massive models are out of reach for over 99.9% of people and organizations. You need huge clusters of high-end GPUs, specialized networking, and enormous ongoing costs.
Fine-tuning is even less accessible—it typically requires almost double the resources needed just to run the model. In practice, open sourcing these giant models is symbolic, not practical, for nearly everyone.

and your point is. do not open source trillion param models? have anything actually useful to say?

At the very least, it allows more people to use high-performance models at a lower cost. Some inference service providers will deploy these open-source models, offering lower prices or giving away vouchers.

Moonshot AI org

I guess you prefer a smaller sized model? I hope you like Moonlight-16B-A3B, and it's actually a prototype version of K2.

I guess you prefer a smaller sized model? I hope you like Moonlight-16B-A3B, and it's actually a prototype version of K2.

Yes. I am looking for the model that I can run on 4 H100s. Maybe you guys will launch the quantized version of the K2 model?

Open sourcing a trillion-parameter language model may seem like a major milestone, but for most users, it changes little. The hardware and infrastructure needed to run, test, or serve such massive models are out of reach for over 99.9% of people and organizations. You need huge clusters of high-end GPUs, specialized networking, and enormous ongoing costs.
Fine-tuning is even less accessible—it typically requires almost double the resources needed just to run the model. In practice, open sourcing these giant models is symbolic, not practical, for nearly everyone.

and your point is. do not open source trillion param models? have anything actually useful to say?

You dont have to be rude. I just gave my opinion. You can say that I am wrong, with reasoning.

Moonshot AI org

quantized version of the K2 model

we do not expertise in quantization, but i found @unsloth has done the great job
https://huggingface.co/unsloth/Kimi-K2-Instruct-GGUF

beggars can't be choosers

Huge thanks to Moonshotai.

Open sourcing a trillion-parameter language model may seem like a major milestone, but for most users, it changes little. The hardware and infrastructure needed to run, test, or serve such massive models are out of reach for over 99.9% of people and organizations. You need huge clusters of high-end GPUs, specialized networking, and enormous ongoing costs.
Fine-tuning is even less accessible—it typically requires almost double the resources needed just to run the model. In practice, open sourcing these giant models is symbolic, not practical, for nearly everyone.

You're wrong, there are thousands of small LLMs you can use, but they are extremely limited and aren’t truly general purpose models, they can’t fully replace GPT-4 or Claude. DeepSeek and K2 can. If they changed little for most users why is everybody talking about k2 right now and deepseek before it, K2 is right now the #1 trending model and DeepSeek was trending for months.
Why did DeepSeek force OpenAI to discuss about open-sourcing a model, while Llama 3.1 70B was just a meme to them?
Large models like k2 and deepseek also possess far more real-world knowledge than smaller models.

YES! Open sourcing big stuff (and small models too) is great.

@kingabzpro
have you never heard of cloud gpu providers? stop complaining because you're too incompetent to research stuff and code, it's obvious you wouldn't have done anything significant even if it ran on your phone. Shocked you even got an answer from the devs, ungrateful low-IQ baboon.

Sign up or log in to comment