Does these quants support MLA?
Hi there, thanks for your work!
I was wondering, does your DeepSeek V3 0324 support MLA? From this PR https://github.com/ggml-org/llama.cpp/pull/12801
As this reduces the VRAM usage, for example, at 16K ctx from 80GB VRAM to 2GB VRAM.
Thanks!
No I did not remake them yet because of the issues mentioned towards the bottom of that PR
I see! It seems it fixed since some days ago
https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF-UD/discussions/2#68192917c3d212ad5b33964d
oooo good catch, okay i'll probably look at remaking then..
Just bumping it. :)
Thanks, but I still have concerns over this issue mentioned by ikawrakow, I'm not sure how to adequately address it:
https://github.com/ikawrakow/ik_llama.cpp/pull/411
I was hoping Johannes would possibly figure it out in the meantime but I don't think he has