GPU requirement

#20
by meetzuber - opened

How much VRAM is needed for run fp16/bf16 model?

402B params means 402 billion numbers. If each number is 16bits, each number is two bytes. 402*2=804. You need 804GB VRAM. That's around 6x H200

Again, that's if you want to load the entire model in VRAM (which means super fast inference). If you don't need that you can use 17B*2=34GB VRAM.

Sign up or log in to comment