Does these quants support MLA?

#6
by Panchovix - opened

Hi there, thanks for your work!

I was wondering, does your DeepSeek V3 0324 support MLA? From this PR https://github.com/ggml-org/llama.cpp/pull/12801

As this reduces the VRAM usage, for example, at 16K ctx from 80GB VRAM to 2GB VRAM.

Thanks!

No I did not remake them yet because of the issues mentioned towards the bottom of that PR

oooo good catch, okay i'll probably look at remaking then..

Just bumping it. :)

Thanks, but I still have concerns over this issue mentioned by ikawrakow, I'm not sure how to adequately address it:

https://github.com/ikawrakow/ik_llama.cpp/pull/411

I was hoping Johannes would possibly figure it out in the meantime but I don't think he has

Sign up or log in to comment