How many layers i should use on my gpu

#2
by koolara - opened

Hi, i have rtx 2050 and ryzen 3600 with ddr4 32gb

Buy at least 1 TiB of RAM or you don't even have to try. Forget about GPU offloading as unless you have servers full of H200 GPUs your GPU memory is just a drop of water into a lake of lava. Just accept the fact that you will never be able to run this behemoth. At least not in the next 20 years. I think I'm the only one so far who ever ran this model for real and I did so by combining 3 servers totaling 896 GiB of RAM and 4 GPUs,

Buy at least 1 TiB of RAM or you don't even have to try. Forget about GPU offloading as unless you have servers full of H200 GPUs your GPU memory is just a drop of water into a lake of lava. Just accept the fact that you will never be able to run this behemoth. At least not in the next 20 years. I think I'm the only one so far who ever ran this model for real and I did so by combining 3 servers totaling 896 GiB of RAM and 4 GPUs,

so, is it okay for rp? XD

so, is it okay for rp? XD

In its current state it’s unfortunately not optimized for roleplay as it’s a merge of the official 405B model. If you have 1 TiB of GPU memory using something like 13x A100 80GB GPUs and 4 TiB of RAM you could QLORA finetune it in 4-bit to heal its merging thinness and finetune it for roleplay. RunPod now supports GPU cluster so might possible if you have a few grands to throw at it. But honestly, I recommend waiting for the 2T parameter large Llama 4 Behemoth model which will soon take the crone of the largest model away from FatLlama 1.7T and due to being MoE should be way faster to run on CPU. FatLalam 1.7T however will probably remain the largest solid LLM on HuggingFace for a really long time.

Sign up or log in to comment