What do we know about the architecture so far?
#6
by
amgadhasan
- opened
Hi,
Has anyone got any info about the architecture?
I suppose it's a MoE? What's the number of total and active params?
Does it support audio or vision input?
Also, this is the chat/instruct version right?
250b~ total params, since its bf16 and 500gb total space
seems to have shared/common layers like deepseek and llama4
MoE, no vision support, from my rough calculations, it's something like ~260B-A30B?
about 270B params MoE, 115B active (2 experts out of 8). The shared FFN layers are very large. The tensors seem to be presharded for 8-fold tensor parallelism.