the notes on each quant

#1
by yarnsp - opened

hey sorry i couldn't find if for example you have the
Q6_K 6.7 very good quality
Q8_0 8.6 fast, best quality
does that mean the Q8_0 is faster than Q6_K?
just because the Q6 is already pushing my system but i might be able to run Q8 with if it's faster.

Q8_0 is easier to decode for your CPU, and can be faster, but other factors, such as your memory bandwidth and other details, can influence it. You'll have to try.

Unless your CPU is low-end/laptop/phone you are likely memory bandwidth bottlenecked. So usually, the smaller the quants the faster you can run them. I did a lot of performance measurements of all our quants on different CPUs which you can obtain from http://www.nicobosshard.ch/perfData.zip
I would be very surprised if on your system Q8 is faster than Q6_K. I personally always use Q5_K_M for the best performance/quality/memory trade-off.

Is the performance data measurement finished? (i.e. ready for the model page, not that I would have time for that right now)?

Is the performance data measurement finished? (i.e. ready for the model page, not that I would have time for that right now)?

Yes I compleated it one month ago: https://huggingface.co/mradermacher/BabyHercules-4x150M-GGUF/discussions/4#67f5cccc7640911cd446d624
I think I never uploaded the final data as above link is slightly data but I have them all localy and will upload them later today if I find time to do so.

sure, no hurry, let's just make sure we eventually make good use of it. i plan to have a more complex selection above the quant table, where you can replace the score column with both speed and quality metrics. or maybe have two columns, not sure. yes, the readme needs to be patched still.

Sign up or log in to comment