mradermacher/FATLLAMA-1.7T-Instruct-GGUF · How many layers i should use on my gpu

koolara

May 27

Hi, i have rtx 2050 and ryzen 3600 with ddr4 32gb

RichardErkhov

May 27

yes

nicoboss

May 27

•

edited May 27

Buy at least 1 TiB of RAM or you don't even have to try. Forget about GPU offloading as unless you have servers full of H200 GPUs your GPU memory is just a drop of water into a lake of lava. Just accept the fact that you will never be able to run this behemoth. At least not in the next 20 years. I think I'm the only one so far who ever ran this model for real and I did so by combining 3 servers totaling 896 GiB of RAM and 4 GPUs,

koolara

May 29

Buy at least 1 TiB of RAM or you don't even have to try. Forget about GPU offloading as unless you have servers full of H200 GPUs your GPU memory is just a drop of water into a lake of lava. Just accept the fact that you will never be able to run this behemoth. At least not in the next 20 years. I think I'm the only one so far who ever ran this model for real and I did so by combining 3 servers totaling 896 GiB of RAM and 4 GPUs,

so, is it okay for rp? XD

nicoboss

May 29

•

edited Aug 6

so, is it okay for rp? XD

In its current state it’s unfortunately not optimized for roleplay as it’s a merge of the official 405B model. If you have 1 TiB of GPU memory using something like 13x A100 80GB GPUs and 4 TiB of RAM you could QLORA finetune it in 4-bit to heal its merging thinness and finetune it for roleplay. RunPod now supports GPU cluster so might possible if you have a few grands to throw at it. But honestly, I recommend waiting for the 2T parameter large Llama 4 Behemoth model which will soon take the crone of the largest model away from FatLlama 1.7T and due to being MoE should be way faster to run on CPU. FatLalam 1.7T however will probably remain the largest solid LLM on HuggingFace for a really long time.