有人测试过在本地跑多大显存占用吗?
#3
by
weisiren
- opened
有人测试过在本地跑多大显存占用吗?
In fp16:
- 4 billion weights -> 8GB VRAM (up front)
- activations, at max context window of 8192 tokens, one sequence, 2.5–4 GB
So 10-12 Gb should be enough to run inference comfortably in fp16.
If you want to reduce memory consumption you can use our 8bit gguf checkpoints: https://huggingface.co/JetBrains/Mellum-4b-base-gguf