Why is the model 1B version 29 GB?

#14
by samedii - opened

Why is the model 1B version 29 GB when the 7B version is 32 GB?

MolmoE-1B is a multimodal Mixture-of-Experts LLM with 1.5B active and 7.2B total parameters based on OLMoE-1B-7B-0924.

As it has 7B total parameters; 1B refers to how many are used per forward pass which is a proxy for its speed.

Thank you!

samedii changed discussion status to closed

Sign up or log in to comment