ubergarm/DeepSeek-V3-0324-GGUF · Comparing to new Dynamic v2.0 Unlsoth quants ?

Oh hey, interesting I didn't know they had released those. Here is what I can say without downloading and testing:

You can check the model card sidebar for the gguf dump info to compare similar bpw models e.g. the UD-Q4_K_XL. So my quants use full q8_0 for all attention tensors, while theirs use more quantized which likely gives degraded quality but possibly at slightly faster speed depending on how you're running it. Mine also support repacked quants exclusive to ik_llama.cpp so will run faster when offloading onto CPU.
There seem to be some possible issues with mainline llama.cpp MLA with this specific quant perhaps mentioned in this discussion, but I haven't tested myself.

I'd be curious if anyone did end up doing speed benchmarks with llama-sweep-bench on ik_llama.cpp fork for example. There are various examples floating around in various issues/PRs/discussions.