Measuring quantization error
Hi @bartowski ! π€
I downloaded all the GGUF quants in this repo (except the Q4_0_X_X variants) and measured how much each model's output deviated from the F16 model output.
Here are the raw results:
Click to expand...
Number of input texts: 10
Shortest input length in tokens: 60
Longest input length in tokens: 4801
Average input length in tokens: 1589.3
Evaluating baseline model Qwen2.5-14B-Instruct-f16.gguf...
Load model...
Evaluate prompts...
Unload model...
Now processing: Qwen2.5-14B-Instruct-Q8_0.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q8_0.gguf:
-- Prompt 0: 0.006173289846628904
-- Prompt 1: 0.006270984187722206
-- Prompt 2: 0.02688116580247879
-- Prompt 3: 0.00597307039424777
-- Prompt 4: 0.005282975267618895
-- Prompt 5: 0.005566565785557032
-- Prompt 6: 0.008658171631395817
-- Prompt 7: 0.00544555950909853
-- Prompt 8: 0.006982115563005209
-- Prompt 9: 0.005512741860002279
Average MSD: 0.008274664171040058
Now processing: Qwen2.5-14B-Instruct-Q6_K_L.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q6_K_L.gguf:
-- Prompt 0: 0.011051331646740437
-- Prompt 1: 0.01911650039255619
-- Prompt 2: 0.03450910001993179
-- Prompt 3: 0.013190604746341705
-- Prompt 4: 0.011938238516449928
-- Prompt 5: 0.01623309589922428
-- Prompt 6: 0.010791392996907234
-- Prompt 7: 0.011643290519714355
-- Prompt 8: 0.011546456255018711
-- Prompt 9: 0.010908189229667187
Average MSD: 0.015092819929122925
Now processing: Qwen2.5-14B-Instruct-Q6_K.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q6_K.gguf:
-- Prompt 0: 0.015314578078687191
-- Prompt 1: 0.020942362025380135
-- Prompt 2: 0.04126137122511864
-- Prompt 3: 0.01671477220952511
-- Prompt 4: 0.014899395406246185
-- Prompt 5: 0.018019668757915497
-- Prompt 6: 0.013208980672061443
-- Prompt 7: 0.014952722936868668
-- Prompt 8: 0.007623671088367701
-- Prompt 9: 0.015152357518672943
Average MSD: 0.017808988690376282
Now processing: Qwen2.5-14B-Instruct-Q5_K_L.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q5_K_L.gguf:
-- Prompt 0: 0.02413204498589039
-- Prompt 1: 0.05958857759833336
-- Prompt 2: 0.0841807946562767
-- Prompt 3: 0.028570910915732384
-- Prompt 4: 0.04034490883350372
-- Prompt 5: 0.041260700672864914
-- Prompt 6: 0.018558379262685776
-- Prompt 7: 0.02682041935622692
-- Prompt 8: 0.02167557366192341
-- Prompt 9: 0.023277537897229195
Average MSD: 0.03684099018573761
Now processing: Qwen2.5-14B-Instruct-Q5_K_M.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q5_K_M.gguf:
-- Prompt 0: 0.02938358299434185
-- Prompt 1: 0.061786502599716187
-- Prompt 2: 0.08980036526918411
-- Prompt 3: 0.03013867512345314
-- Prompt 4: 0.04332244396209717
-- Prompt 5: 0.044466763734817505
-- Prompt 6: 0.022224143147468567
-- Prompt 7: 0.03638884797692299
-- Prompt 8: 0.03322198987007141
-- Prompt 9: 0.027188239619135857
Average MSD: 0.04179215803742409
Now processing: Qwen2.5-14B-Instruct-Q5_K_S.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q5_K_S.gguf:
-- Prompt 0: 0.03523410111665726
-- Prompt 1: 0.07193170487880707
-- Prompt 2: 0.09901951998472214
-- Prompt 3: 0.037072475999593735
-- Prompt 4: 0.049490101635456085
-- Prompt 5: 0.05357277765870094
-- Prompt 6: 0.026367494836449623
-- Prompt 7: 0.04272600635886192
-- Prompt 8: 0.023336559534072876
-- Prompt 9: 0.03316415846347809
Average MSD: 0.04719148948788643
Now processing: Qwen2.5-14B-Instruct-Q4_K_L.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q4_K_L.gguf:
-- Prompt 0: 0.061645302921533585
-- Prompt 1: 0.18674442172050476
-- Prompt 2: 0.17220667004585266
-- Prompt 3: 0.07102318853139877
-- Prompt 4: 0.08699493855237961
-- Prompt 5: 0.13079944252967834
-- Prompt 6: 0.04793873056769371
-- Prompt 7: 0.056853894144296646
-- Prompt 8: 0.04742700606584549
-- Prompt 9: 0.05837811529636383
Average MSD: 0.09200116991996765
Now processing: Qwen2.5-14B-Instruct-Q4_K_M.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q4_K_M.gguf:
-- Prompt 0: 0.0667383223772049
-- Prompt 1: 0.19345436990261078
-- Prompt 2: 0.1678430140018463
-- Prompt 3: 0.07411719113588333
-- Prompt 4: 0.09354892373085022
-- Prompt 5: 0.12952980399131775
-- Prompt 6: 0.05165378749370575
-- Prompt 7: 0.06237340345978737
-- Prompt 8: 0.0452737957239151
-- Prompt 9: 0.060949377715587616
Average MSD: 0.09454820305109024
Now processing: Qwen2.5-14B-Instruct-Q3_K_XL.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q3_K_XL.gguf:
-- Prompt 0: 0.1552238017320633
-- Prompt 1: 0.47992345690727234
-- Prompt 2: 0.36544859409332275
-- Prompt 3: 0.24844709038734436
-- Prompt 4: 0.22208459675312042
-- Prompt 5: 0.38295114040374756
-- Prompt 6: 0.12928639352321625
-- Prompt 7: 0.26791346073150635
-- Prompt 8: 0.08874400705099106
-- Prompt 9: 0.16961412131786346
Average MSD: 0.25096365809440613
Now processing: Qwen2.5-14B-Instruct-Q4_K_S.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q4_K_S.gguf:
-- Prompt 0: 0.07959611713886261
-- Prompt 1: 0.22201518714427948
-- Prompt 2: 0.21687304973602295
-- Prompt 3: 0.09267330914735794
-- Prompt 4: 0.11938505619764328
-- Prompt 5: 0.18173925578594208
-- Prompt 6: 0.06605318188667297
-- Prompt 7: 0.08385279774665833
-- Prompt 8: 0.051718614995479584
-- Prompt 9: 0.07507876306772232
Average MSD: 0.11889852583408356
Now processing: Qwen2.5-14B-Instruct-Q4_0.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q4_0.gguf:
-- Prompt 0: 0.14795683324337006
-- Prompt 1: 0.3184524178504944
-- Prompt 2: 0.3015100955963135
-- Prompt 3: 0.15729208290576935
-- Prompt 4: 0.14496251940727234
-- Prompt 5: 0.230322927236557
-- Prompt 6: 0.09759268909692764
-- Prompt 7: 0.16777078807353973
-- Prompt 8: 0.10262352228164673
-- Prompt 9: 0.12382039427757263
Average MSD: 0.1792304366827011
Now processing: Qwen2.5-14B-Instruct-IQ4_XS.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-IQ4_XS.gguf:
-- Prompt 0: 0.09684300422668457
-- Prompt 1: 0.22154095768928528
-- Prompt 2: 0.2124590426683426
-- Prompt 3: 0.10605628788471222
-- Prompt 4: 0.11930330097675323
-- Prompt 5: 0.16718009114265442
-- Prompt 6: 0.06866174936294556
-- Prompt 7: 0.08735846728086472
-- Prompt 8: 0.052949391305446625
-- Prompt 9: 0.09563449770212173
Average MSD: 0.12279868125915527
Now processing: Qwen2.5-14B-Instruct-Q3_K_L.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q3_K_L.gguf:
-- Prompt 0: 0.158499613404274
-- Prompt 1: 0.4903336763381958
-- Prompt 2: 0.38772934675216675
-- Prompt 3: 0.2480023056268692
-- Prompt 4: 0.2341604232788086
-- Prompt 5: 0.39229917526245117
-- Prompt 6: 0.13427993655204773
-- Prompt 7: 0.2600550651550293
-- Prompt 8: 0.10644746571779251
-- Prompt 9: 0.17418718338012695
Average MSD: 0.2585994303226471
Now processing: Qwen2.5-14B-Instruct-Q3_K_M.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q3_K_M.gguf:
-- Prompt 0: 0.18519753217697144
-- Prompt 1: 0.5482602119445801
-- Prompt 2: 0.4248938262462616
-- Prompt 3: 0.2793578803539276
-- Prompt 4: 0.25249308347702026
-- Prompt 5: 0.4298366606235504
-- Prompt 6: 0.1564488261938095
-- Prompt 7: 0.25133588910102844
-- Prompt 8: 0.12622354924678802
-- Prompt 9: 0.19722333550453186
Average MSD: 0.28512710332870483
Now processing: Qwen2.5-14B-Instruct-IQ3_M.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-IQ3_M.gguf:
-- Prompt 0: 0.2674347758293152
-- Prompt 1: 0.6817806959152222
-- Prompt 2: 0.5234313011169434
-- Prompt 3: 0.3579535186290741
-- Prompt 4: 0.3142457604408264
-- Prompt 5: 0.5666929483413696
-- Prompt 6: 0.20291712880134583
-- Prompt 7: 0.30649566650390625
-- Prompt 8: 0.12531302869319916
-- Prompt 9: 0.27468422055244446
Average MSD: 0.362094908952713
Now processing: Qwen2.5-14B-Instruct-Q3_K_S.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q3_K_S.gguf:
-- Prompt 0: 0.37908855080604553
-- Prompt 1: 0.8373937010765076
-- Prompt 2: 0.6462434530258179
-- Prompt 3: 0.48871299624443054
-- Prompt 4: 0.4499315321445465
-- Prompt 5: 0.8081690669059753
-- Prompt 6: 0.29139599204063416
-- Prompt 7: 0.6071351170539856
-- Prompt 8: 0.18237605690956116
-- Prompt 9: 0.3630615770816803
Average MSD: 0.5053507685661316
Now processing: Qwen2.5-14B-Instruct-Q2_K_L.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q2_K_L.gguf:
-- Prompt 0: 0.6395062804222107
-- Prompt 1: 1.394104242324829
-- Prompt 2: 0.8696668148040771
-- Prompt 3: 0.7236178517341614
-- Prompt 4: 0.7015765309333801
-- Prompt 5: 1.608881950378418
-- Prompt 6: 0.49967923760414124
-- Prompt 7: 0.6466574668884277
-- Prompt 8: 0.23411156237125397
-- Prompt 9: 0.6008867025375366
Average MSD: 0.7918689250946045
Now processing: Qwen2.5-14B-Instruct-IQ3_XS.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-IQ3_XS.gguf:
-- Prompt 0: 0.3111434280872345
-- Prompt 1: 0.8125181794166565
-- Prompt 2: 0.5896871089935303
-- Prompt 3: 0.4471130967140198
-- Prompt 4: 0.392292320728302
-- Prompt 5: 0.6833968162536621
-- Prompt 6: 0.23590423166751862
-- Prompt 7: 0.33583903312683105
-- Prompt 8: 0.15420512855052948
-- Prompt 9: 0.3086031675338745
Average MSD: 0.4270702302455902
Now processing: Qwen2.5-14B-Instruct-Q2_K.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-Q2_K.gguf:
-- Prompt 0: 0.6447346806526184
-- Prompt 1: 1.4237221479415894
-- Prompt 2: 1.307303547859192
-- Prompt 3: 0.7098925113677979
-- Prompt 4: 0.696006715297699
-- Prompt 5: 1.8814460039138794
-- Prompt 6: 0.5259395241737366
-- Prompt 7: 0.6215610504150391
-- Prompt 8: 0.29020702838897705
-- Prompt 9: 0.6094216108322144
Average MSD: 0.8710235357284546
Now processing: Qwen2.5-14B-Instruct-IQ2_M.gguf
Load model...
Evaluate prompts...
Unload model...
Compute MSD...
Mean-Squared Deviation - Qwen2.5-14B-Instruct-f16.gguf vs. Qwen2.5-14B-Instruct-IQ2_M.gguf:
-- Prompt 0: 0.8660668134689331
-- Prompt 1: 1.5735044479370117
-- Prompt 2: 1.1471364498138428
-- Prompt 3: 1.0702450275421143
-- Prompt 4: 0.9072433710098267
-- Prompt 5: 2.0247721672058105
-- Prompt 6: 0.556640625
-- Prompt 7: 0.8271251916885376
-- Prompt 8: 0.2730807065963745
-- Prompt 9: 0.8531641960144043
Average MSD: 1.0098979473114014
Average Mean-Squared Deviation compared to Qwen2.5-14B-Instruct-f16.gguf:
Qwen2.5-14B-Instruct-Q8_0.gguf : 0.008274664171040058
Qwen2.5-14B-Instruct-Q6_K_L.gguf : 0.015092819929122925
Qwen2.5-14B-Instruct-Q6_K.gguf : 0.017808988690376282
Qwen2.5-14B-Instruct-Q5_K_L.gguf : 0.03684099018573761
Qwen2.5-14B-Instruct-Q5_K_M.gguf : 0.04179215803742409
Qwen2.5-14B-Instruct-Q5_K_S.gguf : 0.04719148948788643
Qwen2.5-14B-Instruct-Q4_K_L.gguf : 0.09200116991996765
Qwen2.5-14B-Instruct-Q4_K_M.gguf : 0.09454820305109024
Qwen2.5-14B-Instruct-Q3_K_XL.gguf : 0.25096365809440613
Qwen2.5-14B-Instruct-Q4_K_S.gguf : 0.11889852583408356
Qwen2.5-14B-Instruct-Q4_0.gguf : 0.1792304366827011
Qwen2.5-14B-Instruct-IQ4_XS.gguf : 0.12279868125915527
Qwen2.5-14B-Instruct-Q3_K_L.gguf : 0.2585994303226471
Qwen2.5-14B-Instruct-Q3_K_M.gguf : 0.28512710332870483
Qwen2.5-14B-Instruct-IQ3_M.gguf : 0.362094908952713
Qwen2.5-14B-Instruct-Q3_K_S.gguf : 0.5053507685661316
Qwen2.5-14B-Instruct-Q2_K_L.gguf : 0.7918689250946045
Qwen2.5-14B-Instruct-IQ3_XS.gguf : 0.4270702302455902
Qwen2.5-14B-Instruct-Q2_K.gguf : 0.8710235357284546
Qwen2.5-14B-Instruct-IQ2_M.gguf : 1.0098979473114014
And here's a graph of these values that I made in Google Sheets:
Now, these are only the results for this repo. In order to draw any real conclusions it would be best to repeat this test with a few other repos. If you have any specific repos you'd like me to test (~30B or smaller) feel free to let me know.
With that said, my main takeaway from these results is that the models using Q8_0 output + embedding tensors are not meaningfully better than their standard counterparts. For example the difference between Q4_K_M
and Q4_K_L
is 0.0025
. This is 17x smaller than the difference between Q4_K_L
and Q5_K_S
despite the much larger file size.
Based on these results I would suggest that the experimental quants proposed by ZeroWw are pointless. The end-user trades a significantly larger (slower) GGUF for a completely imperceptible improvement in quality.
Thoughts?
honestly this doesn't surprise me much, I agree that they're negligible at best and placebo at worst, but enough people swear by them i have to carry on hahaha
there's actually even some push to use F32 as the embedding/output.. I doubt it would be meaningful
All that said, your Q2_K vs Q2_K_L does show a bigger difference than expected
It's also not just about the file size, but where the file size is spent. I can understand the embedding/output having a bigger importance than some other layers, but I think there's diminishing returns, like Q4_K_M uses Q4_K, so the difference there is smaller than the Q2_K that uses.. well, Q2_K