有人测试过在本地跑多大显存占用吗?

#3
by weisiren - opened

有人测试过在本地跑多大显存占用吗?

In fp16:

  • 4 billion weights -> 8GB VRAM (up front)
  • activations, at max context window of 8192 tokens, one sequence, 2.5–4 GB

So 10-12 Gb should be enough to run inference comfortably in fp16.

If you want to reduce memory consumption you can use our 8bit gguf checkpoints: https://huggingface.co/JetBrains/Mellum-4b-base-gguf

Sign up or log in to comment