Open Sourcing Trillion-Parameter Models: Does It Really Matter?

#10

by kingabzpro - opened 1 day ago

1 day ago

Open sourcing a trillion-parameter language model may seem like a major milestone, but for most users, it changes little. The hardware and infrastructure needed to run, test, or serve such massive models are out of reach for over 99.9% of people and organizations. You need huge clusters of high-end GPUs, specialized networking, and enormous ongoing costs.
Fine-tuning is even less accessible—it typically requires almost double the resources needed just to run the model. In practice, open sourcing these giant models is symbolic, not practical, for nearly everyone.

feliscat

1 day ago

you can run this on a few thousand dollars worth of consumer hardware

gnivler

1 day ago

Open sourcing a trillion-parameter language model may seem like a major milestone, but for most users, it changes little. The hardware and infrastructure needed to run, test, or serve such massive models are out of reach for over 99.9% of people and organizations. You need huge clusters of high-end GPUs, specialized networking, and enormous ongoing costs.
Fine-tuning is even less accessible—it typically requires almost double the resources needed just to run the model. In practice, open sourcing these giant models is symbolic, not practical, for nearly everyone.

and your point is. do not open source trillion param models? have anything actually useful to say?

ccpowe

1 day ago

At the very least, it allows more people to use high-performance models at a lower cost. Some inference service providers will deploy these open-source models, offering lower prices or giving away vouchers.

bigeagle

Moonshot AI org about 15 hours ago

I guess you prefer a smaller sized model? I hope you like Moonlight-16B-A3B, and it's actually a prototype version of K2.

kingabzpro

about 11 hours ago

I guess you prefer a smaller sized model? I hope you like Moonlight-16B-A3B, and it's actually a prototype version of K2.

Yes. I am looking for the model that I can run on 4 H100s. Maybe you guys will launch the quantized version of the K2 model?

kingabzpro

about 11 hours ago

Open sourcing a trillion-parameter language model may seem like a major milestone, but for most users, it changes little. The hardware and infrastructure needed to run, test, or serve such massive models are out of reach for over 99.9% of people and organizations. You need huge clusters of high-end GPUs, specialized networking, and enormous ongoing costs.
Fine-tuning is even less accessible—it typically requires almost double the resources needed just to run the model. In practice, open sourcing these giant models is symbolic, not practical, for nearly everyone.

and your point is. do not open source trillion param models? have anything actually useful to say?

You dont have to be rude. I just gave my opinion. You can say that I am wrong, with reasoning.

bigeagle

Moonshot AI org about 10 hours ago

quantized version of the K2 model

we do not expertise in quantization, but i found @unsloth has done the great job
https://huggingface.co/unsloth/Kimi-K2-Instruct-GGUF

halldorj

about 10 hours ago

beggars can't be choosers

Huge thanks to Moonshotai.

Shinku

about 9 hours ago

Open sourcing a trillion-parameter language model may seem like a major milestone, but for most users, it changes little. The hardware and infrastructure needed to run, test, or serve such massive models are out of reach for over 99.9% of people and organizations. You need huge clusters of high-end GPUs, specialized networking, and enormous ongoing costs.
Fine-tuning is even less accessible—it typically requires almost double the resources needed just to run the model. In practice, open sourcing these giant models is symbolic, not practical, for nearly everyone.

You're wrong, there are thousands of small LLMs you can use, but they are extremely limited and aren’t truly general purpose models, they can’t fully replace GPT-4 or Claude. DeepSeek and K2 can. If they changed little for most users why is everybody talking about k2 right now and deepseek before it, K2 is right now the #1 trending model and DeepSeek was trending for months.
Why did DeepSeek force OpenAI to discuss about open-sourcing a model, while Llama 3.1 70B was just a meme to them?
Large models like k2 and deepseek also possess far more real-world knowledge than smaller models.

Blazgo

about 7 hours ago

YES! Open sourcing big stuff (and small models too) is great.

gitanon124

2 minutes ago

@kingabzpro
have you never heard of cloud gpu providers? stop complaining because you're too incompetent to research stuff and code, it's obvious you wouldn't have done anything significant even if it ran on your phone. Shocked you even got an answer from the devs, ungrateful low-IQ baboon.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment