Comparing to new Dynamic v2.0 Unlsoth quants ?
#2
by
BernardH
- opened
Thank you so much for your quants and the info you provide on how to use ik_llama.cpp !
It seems Unsloth just released new quants :
https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF-UD
Would you mind comparing your quants with the new (Dynamic v2.0) Unsloth quants ?
Oh hey, interesting I didn't know they had released those. Here is what I can say without downloading and testing:
- You can check the model card sidebar for the gguf dump info to compare similar bpw models e.g. the UD-Q4_K_XL. So my quants use full
q8_0
for all attention tensors, while theirs use more quantized which likely gives degraded quality but possibly at slightly faster speed depending on how you're running it. Mine also support repacked quants exclusive toik_llama.cpp
so will run faster when offloading onto CPU. - There seem to be some possible issues with mainline llama.cpp MLA with this specific quant perhaps mentioned in this discussion, but I haven't tested myself.
I'd be curious if anyone did end up doing speed benchmarks with llama-sweep-bench
on ik_llama.cpp
fork for example. There are various examples floating around in various issues/PRs/discussions.