eaddario commited on
Commit
be45168
·
verified ·
1 Parent(s): 0ba2d3d

Generate Perplexity, KLD, ARC, HellaSwag, MMLU, Truthful QA and WinoGrande scores

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. scores/Qwen3-30B-A3B-pruned-F16.arc +15 -0
  2. scores/Qwen3-30B-A3B-pruned-F16.hsw +14 -0
  3. scores/Qwen3-30B-A3B-pruned-F16.mmlu +15 -0
  4. scores/Qwen3-30B-A3B-pruned-F16.tqa +15 -0
  5. scores/Qwen3-30B-A3B-pruned-F16.wng +13 -0
  6. scores/Qwen3-30B-A3B-pruned-iq3_m.arc +13 -0
  7. scores/Qwen3-30B-A3B-pruned-iq3_m.hsw +12 -0
  8. scores/Qwen3-30B-A3B-pruned-iq3_m.mmlu +13 -0
  9. scores/Qwen3-30B-A3B-pruned-iq3_m.ppx +37 -0
  10. scores/Qwen3-30B-A3B-pruned-iq3_m.tqa +13 -0
  11. scores/Qwen3-30B-A3B-pruned-iq3_m.wng +11 -0
  12. scores/Qwen3-30B-A3B-pruned-iq3_s.arc +13 -0
  13. scores/Qwen3-30B-A3B-pruned-iq3_s.hsw +12 -0
  14. scores/Qwen3-30B-A3B-pruned-iq3_s.mmlu +13 -0
  15. scores/Qwen3-30B-A3B-pruned-iq3_s.ppx +37 -0
  16. scores/Qwen3-30B-A3B-pruned-iq3_s.tqa +13 -0
  17. scores/Qwen3-30B-A3B-pruned-iq3_s.wng +11 -0
  18. scores/Qwen3-30B-A3B-pruned-iq4_nl.arc +13 -0
  19. scores/Qwen3-30B-A3B-pruned-iq4_nl.hsw +12 -0
  20. scores/Qwen3-30B-A3B-pruned-iq4_nl.mmlu +13 -0
  21. scores/Qwen3-30B-A3B-pruned-iq4_nl.ppx +37 -0
  22. scores/Qwen3-30B-A3B-pruned-iq4_nl.tqa +13 -0
  23. scores/Qwen3-30B-A3B-pruned-iq4_nl.wng +11 -0
  24. scores/Qwen3-30B-A3B-pruned-q3_k_l.arc +13 -0
  25. scores/Qwen3-30B-A3B-pruned-q3_k_l.hsw +12 -0
  26. scores/Qwen3-30B-A3B-pruned-q3_k_l.mmlu +13 -0
  27. scores/Qwen3-30B-A3B-pruned-q3_k_l.ppx +37 -0
  28. scores/Qwen3-30B-A3B-pruned-q3_k_l.tqa +13 -0
  29. scores/Qwen3-30B-A3B-pruned-q3_k_l.wng +11 -0
  30. scores/Qwen3-30B-A3B-pruned-q3_k_m.arc +13 -0
  31. scores/Qwen3-30B-A3B-pruned-q3_k_m.hsw +12 -0
  32. scores/Qwen3-30B-A3B-pruned-q3_k_m.mmlu +13 -0
  33. scores/Qwen3-30B-A3B-pruned-q3_k_m.ppx +37 -0
  34. scores/Qwen3-30B-A3B-pruned-q3_k_m.tqa +13 -0
  35. scores/Qwen3-30B-A3B-pruned-q3_k_m.wng +11 -0
  36. scores/Qwen3-30B-A3B-pruned-q3_k_s.arc +13 -0
  37. scores/Qwen3-30B-A3B-pruned-q3_k_s.hsw +12 -0
  38. scores/Qwen3-30B-A3B-pruned-q3_k_s.mmlu +13 -0
  39. scores/Qwen3-30B-A3B-pruned-q3_k_s.ppx +37 -0
  40. scores/Qwen3-30B-A3B-pruned-q3_k_s.tqa +13 -0
  41. scores/Qwen3-30B-A3B-pruned-q3_k_s.wng +11 -0
  42. scores/Qwen3-30B-A3B-pruned-q4_k_m.arc +13 -0
  43. scores/Qwen3-30B-A3B-pruned-q4_k_m.hsw +12 -0
  44. scores/Qwen3-30B-A3B-pruned-q4_k_m.mmlu +13 -0
  45. scores/Qwen3-30B-A3B-pruned-q4_k_m.ppx +37 -0
  46. scores/Qwen3-30B-A3B-pruned-q4_k_m.tqa +13 -0
  47. scores/Qwen3-30B-A3B-pruned-q4_k_m.wng +11 -0
  48. scores/Qwen3-30B-A3B-pruned-q4_k_s.arc +13 -0
  49. scores/Qwen3-30B-A3B-pruned-q4_k_s.hsw +12 -0
  50. scores/Qwen3-30B-A3B-pruned-q4_k_s.mmlu +13 -0
scores/Qwen3-30B-A3B-pruned-F16.arc ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux
2
+ llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free
3
+ llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free
4
+ llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free
5
+ llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free
6
+ llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest))
7
+
8
+ Final result: 66.6667 +/- 1.7225
9
+ Random chance: 25.0083 +/- 1.5824
10
+
11
+
12
+ llama_perf_context_print: load time = 476545.17 ms
13
+ llama_perf_context_print: prompt eval time = 317100.07 ms / 35972 tokens ( 8.82 ms per token, 113.44 tokens per second)
14
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
15
+ llama_perf_context_print: total time = 320554.96 ms / 35973 tokens
scores/Qwen3-30B-A3B-pruned-F16.hsw ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux
2
+ llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free
3
+ llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free
4
+ llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free
5
+ llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free
6
+ llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest))
7
+
8
+ 750 72.66666667% [69.3676%, 75.7347%]
9
+
10
+
11
+ llama_perf_context_print: load time = 14042.88 ms
12
+ llama_perf_context_print: prompt eval time = 953982.56 ms / 123581 tokens ( 7.72 ms per token, 129.54 tokens per second)
13
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
14
+ llama_perf_context_print: total time = 971012.88 ms / 123582 tokens
scores/Qwen3-30B-A3B-pruned-F16.mmlu ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux
2
+ llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free
3
+ llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free
4
+ llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free
5
+ llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free
6
+ llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest))
7
+
8
+ Final result: 42.1333 +/- 1.8042
9
+ Random chance: 25.0000 +/- 1.5822
10
+
11
+
12
+ llama_perf_context_print: load time = 13886.47 ms
13
+ llama_perf_context_print: prompt eval time = 494837.42 ms / 67719 tokens ( 7.31 ms per token, 136.85 tokens per second)
14
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
15
+ llama_perf_context_print: total time = 500285.67 ms / 67720 tokens
scores/Qwen3-30B-A3B-pruned-F16.tqa ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux
2
+ llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free
3
+ llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free
4
+ llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free
5
+ llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free
6
+ llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest))
7
+
8
+ Final result: 31.2000 +/- 1.6929
9
+ Random chance: 19.8992 +/- 1.4588
10
+
11
+
12
+ llama_perf_context_print: load time = 13704.38 ms
13
+ llama_perf_context_print: prompt eval time = 426482.94 ms / 49696 tokens ( 8.58 ms per token, 116.53 tokens per second)
14
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
15
+ llama_perf_context_print: total time = 433376.19 ms / 49697 tokens
scores/Qwen3-30B-A3B-pruned-F16.wng ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux
2
+ llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free
3
+ llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free
4
+ llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free
5
+ llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free
6
+ llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest))
7
+
8
+ Final Winogrande score(750 tasks): 75.8667 +/- 1.5635
9
+
10
+ llama_perf_context_print: load time = 13885.91 ms
11
+ llama_perf_context_print: prompt eval time = 165214.42 ms / 21448 tokens ( 7.70 ms per token, 129.82 tokens per second)
12
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
13
+ llama_perf_context_print: total time = 168672.50 ms / 21449 tokens
scores/Qwen3-30B-A3B-pruned-iq3_m.arc ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 56.8000 +/- 1.8100
6
+ Random chance: 25.0083 +/- 1.5824
7
+
8
+
9
+ llama_perf_context_print: load time = 5963.39 ms
10
+ llama_perf_context_print: prompt eval time = 37054.73 ms / 35972 tokens ( 1.03 ms per token, 970.78 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 37976.03 ms / 35973 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-iq3_m.hsw ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest))
4
+
5
+ 750 70.26666667% [66.8989%, 73.4279%]
6
+
7
+
8
+ llama_perf_context_print: load time = 973.57 ms
9
+ llama_perf_context_print: prompt eval time = 124967.37 ms / 126038 tokens ( 0.99 ms per token, 1008.57 tokens per second)
10
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
11
+ llama_perf_context_print: total time = 128697.47 ms / 126039 tokens
12
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-iq3_m.mmlu ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 39.0667 +/- 1.7827
6
+ Random chance: 25.0000 +/- 1.5822
7
+
8
+
9
+ llama_perf_context_print: load time = 991.14 ms
10
+ llama_perf_context_print: prompt eval time = 66988.49 ms / 67719 tokens ( 0.99 ms per token, 1010.91 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 68293.40 ms / 67720 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-iq3_m.ppx ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ====== Perplexity statistics ======
2
+ Mean PPL(Q) : 77.090453 ± 1.044822
3
+ Mean PPL(base) : 8.445938 ± 0.065177
4
+ Cor(ln(PPL(Q)), ln(PPL(base))): 73.55%
5
+ Mean ln(PPL(Q)/PPL(base)) : 2.211294 ± 0.009454
6
+ Mean PPL(Q)/PPL(base) : 9.127518 ± 0.086296
7
+ Mean PPL(Q)-PPL(base) : 68.644515 ± 0.997862
8
+
9
+ ====== KL divergence statistics ======
10
+ Mean KLD: 2.063818 ± 0.006856
11
+ Maximum KLD: 39.386982
12
+ 99.9% KLD: 19.179407
13
+ 99.0% KLD: 12.731147
14
+ 99.0% KLD: 12.731147
15
+ Median KLD: 1.246560
16
+ 10.0% KLD: 0.011396
17
+ 5.0% KLD: 0.001648
18
+ 1.0% KLD: 0.000063
19
+ Minimum KLD: -0.000003
20
+
21
+ ====== Token probability statistics ======
22
+ Mean Δp: -9.665 ± 0.088 %
23
+ Maximum Δp: 99.654%
24
+ 99.9% Δp: 93.397%
25
+ 99.0% Δp: 73.865%
26
+ 95.0% Δp: 40.361%
27
+ 90.0% Δp: 19.833%
28
+ 75.0% Δp: 0.586%
29
+ Median Δp: -0.448%
30
+ 25.0% Δp: -15.350%
31
+ 10.0% Δp: -62.944%
32
+ 5.0% Δp: -90.459%
33
+ 1.0% Δp: -99.970%
34
+ 0.1% Δp: -100.000%
35
+ Minimum Δp: -100.000%
36
+ RMS Δp : 35.199 ± 0.092 %
37
+ Same top p: 57.360 ± 0.128 %
scores/Qwen3-30B-A3B-pruned-iq3_m.tqa ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 30.6667 +/- 1.6849
6
+ Random chance: 19.8992 +/- 1.4588
7
+
8
+
9
+ llama_perf_context_print: load time = 1061.94 ms
10
+ llama_perf_context_print: prompt eval time = 52253.67 ms / 49696 tokens ( 1.05 ms per token, 951.05 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 54015.85 ms / 49697 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-iq3_m.wng ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest))
4
+
5
+ Final Winogrande score(750 tasks): 62.5333 +/- 1.7686
6
+
7
+ llama_perf_context_print: load time = 1084.28 ms
8
+ llama_perf_context_print: prompt eval time = 21513.44 ms / 21448 tokens ( 1.00 ms per token, 996.96 tokens per second)
9
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
10
+ llama_perf_context_print: total time = 22046.16 ms / 21449 tokens
11
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-iq3_s.arc ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 48.5333 +/- 1.8262
6
+ Random chance: 25.0083 +/- 1.5824
7
+
8
+
9
+ llama_perf_context_print: load time = 5752.48 ms
10
+ llama_perf_context_print: prompt eval time = 37048.76 ms / 35972 tokens ( 1.03 ms per token, 970.94 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 37992.87 ms / 35973 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-iq3_s.hsw ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest))
4
+
5
+ 750 68.66666667% [65.2590%, 71.8841%]
6
+
7
+
8
+ llama_perf_context_print: load time = 1004.53 ms
9
+ llama_perf_context_print: prompt eval time = 127701.50 ms / 126038 tokens ( 1.01 ms per token, 986.97 tokens per second)
10
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
11
+ llama_perf_context_print: total time = 131559.01 ms / 126039 tokens
12
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-iq3_s.mmlu ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 37.0667 +/- 1.7648
6
+ Random chance: 25.0000 +/- 1.5822
7
+
8
+
9
+ llama_perf_context_print: load time = 1051.40 ms
10
+ llama_perf_context_print: prompt eval time = 67708.86 ms / 67719 tokens ( 1.00 ms per token, 1000.15 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 69083.94 ms / 67720 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-iq3_s.ppx ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ====== Perplexity statistics ======
2
+ Mean PPL(Q) : 69.935907 ± 0.918185
3
+ Mean PPL(base) : 8.445938 ± 0.065177
4
+ Cor(ln(PPL(Q)), ln(PPL(base))): 72.89%
5
+ Mean ln(PPL(Q)/PPL(base)) : 2.113894 ± 0.009177
6
+ Mean PPL(Q)/PPL(base) : 8.280419 ± 0.075991
7
+ Mean PPL(Q)-PPL(base) : 61.489969 ± 0.871820
8
+
9
+ ====== KL divergence statistics ======
10
+ Mean KLD: 1.997500 ± 0.006825
11
+ Maximum KLD: 36.616871
12
+ 99.9% KLD: 18.998100
13
+ 99.0% KLD: 12.936396
14
+ 99.0% KLD: 12.936396
15
+ Median KLD: 1.190034
16
+ 10.0% KLD: 0.013888
17
+ 5.0% KLD: 0.002134
18
+ 1.0% KLD: 0.000090
19
+ Minimum KLD: -0.000004
20
+
21
+ ====== Token probability statistics ======
22
+ Mean Δp: -10.199 ± 0.088 %
23
+ Maximum Δp: 99.504%
24
+ 99.9% Δp: 93.029%
25
+ 99.0% Δp: 72.891%
26
+ 95.0% Δp: 39.848%
27
+ 90.0% Δp: 19.063%
28
+ 75.0% Δp: 0.528%
29
+ Median Δp: -0.540%
30
+ 25.0% Δp: -16.787%
31
+ 10.0% Δp: -63.592%
32
+ 5.0% Δp: -91.195%
33
+ 1.0% Δp: -99.977%
34
+ 0.1% Δp: -100.000%
35
+ Minimum Δp: -100.000%
36
+ RMS Δp : 35.474 ± 0.092 %
37
+ Same top p: 56.834 ± 0.128 %
scores/Qwen3-30B-A3B-pruned-iq3_s.tqa ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 32.0000 +/- 1.7045
6
+ Random chance: 19.8992 +/- 1.4588
7
+
8
+
9
+ llama_perf_context_print: load time = 1017.78 ms
10
+ llama_perf_context_print: prompt eval time = 51819.94 ms / 49696 tokens ( 1.04 ms per token, 959.01 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 53589.19 ms / 49697 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-iq3_s.wng ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest))
4
+
5
+ Final Winogrande score(750 tasks): 63.8667 +/- 1.7553
6
+
7
+ llama_perf_context_print: load time = 1065.10 ms
8
+ llama_perf_context_print: prompt eval time = 21437.40 ms / 21448 tokens ( 1.00 ms per token, 1000.49 tokens per second)
9
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
10
+ llama_perf_context_print: total time = 21986.82 ms / 21449 tokens
11
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-iq4_nl.arc ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 61.7333 +/- 1.7759
6
+ Random chance: 25.0083 +/- 1.5824
7
+
8
+
9
+ llama_perf_context_print: load time = 7153.23 ms
10
+ llama_perf_context_print: prompt eval time = 37237.77 ms / 35972 tokens ( 1.04 ms per token, 966.01 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 38188.71 ms / 35973 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-iq4_nl.hsw ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest))
4
+
5
+ 750 71.20000000% [67.8576%, 74.3263%]
6
+
7
+
8
+ llama_perf_context_print: load time = 1206.30 ms
9
+ llama_perf_context_print: prompt eval time = 127295.75 ms / 126038 tokens ( 1.01 ms per token, 990.12 tokens per second)
10
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
11
+ llama_perf_context_print: total time = 131179.47 ms / 126039 tokens
12
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-iq4_nl.mmlu ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 40.9333 +/- 1.7967
6
+ Random chance: 25.0000 +/- 1.5822
7
+
8
+
9
+ llama_perf_context_print: load time = 1184.18 ms
10
+ llama_perf_context_print: prompt eval time = 68103.64 ms / 67719 tokens ( 1.01 ms per token, 994.35 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 69454.89 ms / 67720 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-iq4_nl.ppx ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ====== Perplexity statistics ======
2
+ Mean PPL(Q) : 58.059268 ± 0.724129
3
+ Mean PPL(base) : 8.445938 ± 0.065177
4
+ Cor(ln(PPL(Q)), ln(PPL(base))): 73.87%
5
+ Mean ln(PPL(Q)/PPL(base)) : 1.927779 ± 0.008539
6
+ Mean PPL(Q)/PPL(base) : 6.874224 ± 0.058701
7
+ Mean PPL(Q)-PPL(base) : 49.613331 ± 0.677412
8
+
9
+ ====== KL divergence statistics ======
10
+ Mean KLD: 1.827625 ± 0.006356
11
+ Maximum KLD: 37.203815
12
+ 99.9% KLD: 17.289213
13
+ 99.0% KLD: 12.241351
14
+ 99.0% KLD: 12.241351
15
+ Median KLD: 1.062228
16
+ 10.0% KLD: 0.013747
17
+ 5.0% KLD: 0.002407
18
+ 1.0% KLD: 0.000120
19
+ Minimum KLD: -0.000003
20
+
21
+ ====== Token probability statistics ======
22
+ Mean Δp: -10.074 ± 0.087 %
23
+ Maximum Δp: 99.662%
24
+ 99.9% Δp: 90.888%
25
+ 99.0% Δp: 70.751%
26
+ 95.0% Δp: 38.384%
27
+ 90.0% Δp: 18.656%
28
+ 75.0% Δp: 0.543%
29
+ Median Δp: -0.549%
30
+ 25.0% Δp: -15.860%
31
+ 10.0% Δp: -62.678%
32
+ 5.0% Δp: -91.248%
33
+ 1.0% Δp: -99.979%
34
+ 0.1% Δp: -100.000%
35
+ Minimum Δp: -100.000%
36
+ RMS Δp : 35.013 ± 0.093 %
37
+ Same top p: 58.204 ± 0.128 %
scores/Qwen3-30B-A3B-pruned-iq4_nl.tqa ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 30.9333 +/- 1.6889
6
+ Random chance: 19.8992 +/- 1.4588
7
+
8
+
9
+ llama_perf_context_print: load time = 1254.92 ms
10
+ llama_perf_context_print: prompt eval time = 52206.26 ms / 49696 tokens ( 1.05 ms per token, 951.92 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 53979.12 ms / 49697 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-iq4_nl.wng ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest))
4
+
5
+ Final Winogrande score(750 tasks): 65.8667 +/- 1.7325
6
+
7
+ llama_perf_context_print: load time = 1229.62 ms
8
+ llama_perf_context_print: prompt eval time = 21328.35 ms / 21448 tokens ( 0.99 ms per token, 1005.61 tokens per second)
9
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
10
+ llama_perf_context_print: total time = 21840.27 ms / 21449 tokens
11
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_l.arc ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 57.8667 +/- 1.8042
6
+ Random chance: 25.0083 +/- 1.5824
7
+
8
+
9
+ llama_perf_context_print: load time = 5836.97 ms
10
+ llama_perf_context_print: prompt eval time = 38153.96 ms / 35972 tokens ( 1.06 ms per token, 942.81 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 39110.03 ms / 35973 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_l.hsw ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest))
4
+
5
+ 750 71.73333333% [68.4062%, 74.8389%]
6
+
7
+
8
+ llama_perf_context_print: load time = 1007.76 ms
9
+ llama_perf_context_print: prompt eval time = 130309.00 ms / 126038 tokens ( 1.03 ms per token, 967.22 tokens per second)
10
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
11
+ llama_perf_context_print: total time = 134163.99 ms / 126039 tokens
12
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_l.mmlu ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 38.6667 +/- 1.7794
6
+ Random chance: 25.0000 +/- 1.5822
7
+
8
+
9
+ llama_perf_context_print: load time = 1034.53 ms
10
+ llama_perf_context_print: prompt eval time = 69376.50 ms / 67719 tokens ( 1.02 ms per token, 976.11 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 70750.75 ms / 67720 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_l.ppx ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ====== Perplexity statistics ======
2
+ Mean PPL(Q) : 60.855606 ± 0.768774
3
+ Mean PPL(base) : 8.445938 ± 0.065177
4
+ Cor(ln(PPL(Q)), ln(PPL(base))): 73.47%
5
+ Mean ln(PPL(Q)/PPL(base)) : 1.974818 ± 0.008712
6
+ Mean PPL(Q)/PPL(base) : 7.205311 ± 0.062773
7
+ Mean PPL(Q)-PPL(base) : 52.409668 ± 0.722246
8
+
9
+ ====== KL divergence statistics ======
10
+ Mean KLD: 1.886749 ± 0.006413
11
+ Maximum KLD: 32.243908
12
+ 99.9% KLD: 18.146301
13
+ 99.0% KLD: 12.053426
14
+ 99.0% KLD: 12.053426
15
+ Median KLD: 1.116526
16
+ 10.0% KLD: 0.014264
17
+ 5.0% KLD: 0.002443
18
+ 1.0% KLD: 0.000117
19
+ Minimum KLD: -0.000003
20
+
21
+ ====== Token probability statistics ======
22
+ Mean Δp: -10.112 ± 0.088 %
23
+ Maximum Δp: 99.677%
24
+ 99.9% Δp: 92.123%
25
+ 99.0% Δp: 72.105%
26
+ 95.0% Δp: 39.096%
27
+ 90.0% Δp: 19.004%
28
+ 75.0% Δp: 0.549%
29
+ Median Δp: -0.551%
30
+ 25.0% Δp: -16.110%
31
+ 10.0% Δp: -63.590%
32
+ 5.0% Δp: -91.581%
33
+ 1.0% Δp: -99.973%
34
+ 0.1% Δp: -100.000%
35
+ Minimum Δp: -100.000%
36
+ RMS Δp : 35.285 ± 0.093 %
37
+ Same top p: 57.653 ± 0.128 %
scores/Qwen3-30B-A3B-pruned-q3_k_l.tqa ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 32.2667 +/- 1.7082
6
+ Random chance: 19.8992 +/- 1.4588
7
+
8
+
9
+ llama_perf_context_print: load time = 1045.56 ms
10
+ llama_perf_context_print: prompt eval time = 53181.04 ms / 49696 tokens ( 1.07 ms per token, 934.47 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 54884.20 ms / 49697 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_l.wng ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest))
4
+
5
+ Final Winogrande score(750 tasks): 65.2000 +/- 1.7405
6
+
7
+ llama_perf_context_print: load time = 964.35 ms
8
+ llama_perf_context_print: prompt eval time = 21817.15 ms / 21448 tokens ( 1.02 ms per token, 983.08 tokens per second)
9
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
10
+ llama_perf_context_print: total time = 22321.50 ms / 21449 tokens
11
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_m.arc ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 56.4000 +/- 1.8119
6
+ Random chance: 25.0083 +/- 1.5824
7
+
8
+
9
+ llama_perf_context_print: load time = 5706.95 ms
10
+ llama_perf_context_print: prompt eval time = 37640.79 ms / 35972 tokens ( 1.05 ms per token, 955.67 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 38556.26 ms / 35973 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_m.hsw ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest))
4
+
5
+ 750 70.80000000% [67.4465%, 73.9415%]
6
+
7
+
8
+ llama_perf_context_print: load time = 924.40 ms
9
+ llama_perf_context_print: prompt eval time = 127735.77 ms / 126038 tokens ( 1.01 ms per token, 986.71 tokens per second)
10
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
11
+ llama_perf_context_print: total time = 131427.60 ms / 126039 tokens
12
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_m.mmlu ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 39.3333 +/- 1.7849
6
+ Random chance: 25.0000 +/- 1.5822
7
+
8
+
9
+ llama_perf_context_print: load time = 943.54 ms
10
+ llama_perf_context_print: prompt eval time = 67380.31 ms / 67719 tokens ( 0.99 ms per token, 1005.03 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 68692.56 ms / 67720 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_m.ppx ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ====== Perplexity statistics ======
2
+ Mean PPL(Q) : 59.072808 ± 0.741897
3
+ Mean PPL(base) : 8.445938 ± 0.065177
4
+ Cor(ln(PPL(Q)), ln(PPL(base))): 73.82%
5
+ Mean ln(PPL(Q)/PPL(base)) : 1.945085 ± 0.008614
6
+ Mean PPL(Q)/PPL(base) : 6.994227 ± 0.060246
7
+ Mean PPL(Q)-PPL(base) : 50.626870 ± 0.695177
8
+
9
+ ====== KL divergence statistics ======
10
+ Mean KLD: 1.857932 ± 0.006326
11
+ Maximum KLD: 31.393671
12
+ 99.9% KLD: 17.826597
13
+ 99.0% KLD: 11.908570
14
+ 99.0% KLD: 11.908570
15
+ Median KLD: 1.097884
16
+ 10.0% KLD: 0.013517
17
+ 5.0% KLD: 0.002297
18
+ 1.0% KLD: 0.000108
19
+ Minimum KLD: -0.000003
20
+
21
+ ====== Token probability statistics ======
22
+ Mean Δp: -9.998 ± 0.087 %
23
+ Maximum Δp: 99.678%
24
+ 99.9% Δp: 91.515%
25
+ 99.0% Δp: 72.068%
26
+ 95.0% Δp: 39.410%
27
+ 90.0% Δp: 18.817%
28
+ 75.0% Δp: 0.535%
29
+ Median Δp: -0.541%
30
+ 25.0% Δp: -15.834%
31
+ 10.0% Δp: -62.929%
32
+ 5.0% Δp: -91.189%
33
+ 1.0% Δp: -99.971%
34
+ 0.1% Δp: -99.999%
35
+ Minimum Δp: -100.000%
36
+ RMS Δp : 35.157 ± 0.092 %
37
+ Same top p: 57.800 ± 0.128 %
scores/Qwen3-30B-A3B-pruned-q3_k_m.tqa ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 31.6000 +/- 1.6988
6
+ Random chance: 19.8992 +/- 1.4588
7
+
8
+
9
+ llama_perf_context_print: load time = 973.04 ms
10
+ llama_perf_context_print: prompt eval time = 52005.69 ms / 49696 tokens ( 1.05 ms per token, 955.59 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 53651.35 ms / 49697 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_m.wng ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest))
4
+
5
+ Final Winogrande score(750 tasks): 64.8000 +/- 1.7451
6
+
7
+ llama_perf_context_print: load time = 1010.06 ms
8
+ llama_perf_context_print: prompt eval time = 21536.50 ms / 21448 tokens ( 1.00 ms per token, 995.89 tokens per second)
9
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
10
+ llama_perf_context_print: total time = 22054.60 ms / 21449 tokens
11
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_s.arc ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 58.1333 +/- 1.8026
6
+ Random chance: 25.0083 +/- 1.5824
7
+
8
+
9
+ llama_perf_context_print: load time = 5783.27 ms
10
+ llama_perf_context_print: prompt eval time = 37846.85 ms / 35972 tokens ( 1.05 ms per token, 950.46 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 38757.96 ms / 35973 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_s.hsw ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest))
4
+
5
+ 750 71.46666667% [68.1319%, 74.5827%]
6
+
7
+
8
+ llama_perf_context_print: load time = 885.76 ms
9
+ llama_perf_context_print: prompt eval time = 127239.40 ms / 126038 tokens ( 1.01 ms per token, 990.56 tokens per second)
10
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
11
+ llama_perf_context_print: total time = 130949.53 ms / 126039 tokens
12
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_s.mmlu ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 38.9333 +/- 1.7816
6
+ Random chance: 25.0000 +/- 1.5822
7
+
8
+
9
+ llama_perf_context_print: load time = 972.00 ms
10
+ llama_perf_context_print: prompt eval time = 67683.27 ms / 67719 tokens ( 1.00 ms per token, 1000.53 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 68987.39 ms / 67720 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_s.ppx ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ====== Perplexity statistics ======
2
+ Mean PPL(Q) : 61.676169 ± 0.780539
3
+ Mean PPL(base) : 8.445938 ± 0.065177
4
+ Cor(ln(PPL(Q)), ln(PPL(base))): 73.64%
5
+ Mean ln(PPL(Q)/PPL(base)) : 1.988212 ± 0.008711
6
+ Mean PPL(Q)/PPL(base) : 7.302465 ± 0.063613
7
+ Mean PPL(Q)-PPL(base) : 53.230232 ± 0.733873
8
+
9
+ ====== KL divergence statistics ======
10
+ Mean KLD: 1.888847 ± 0.006380
11
+ Maximum KLD: 33.008038
12
+ 99.9% KLD: 17.721254
13
+ 99.0% KLD: 12.006232
14
+ 99.0% KLD: 12.006232
15
+ Median KLD: 1.128817
16
+ 10.0% KLD: 0.013821
17
+ 5.0% KLD: 0.002327
18
+ 1.0% KLD: 0.000107
19
+ Minimum KLD: -0.000003
20
+
21
+ ====== Token probability statistics ======
22
+ Mean Δp: -10.182 ± 0.088 %
23
+ Maximum Δp: 99.676%
24
+ 99.9% Δp: 91.955%
25
+ 99.0% Δp: 72.330%
26
+ 95.0% Δp: 39.211%
27
+ 90.0% Δp: 18.768%
28
+ 75.0% Δp: 0.478%
29
+ Median Δp: -0.585%
30
+ 25.0% Δp: -16.265%
31
+ 10.0% Δp: -63.264%
32
+ 5.0% Δp: -91.364%
33
+ 1.0% Δp: -99.971%
34
+ 0.1% Δp: -99.999%
35
+ Minimum Δp: -100.000%
36
+ RMS Δp : 35.283 ± 0.093 %
37
+ Same top p: 57.426 ± 0.128 %
scores/Qwen3-30B-A3B-pruned-q3_k_s.tqa ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 30.9333 +/- 1.6889
6
+ Random chance: 19.8992 +/- 1.4588
7
+
8
+
9
+ llama_perf_context_print: load time = 988.65 ms
10
+ llama_perf_context_print: prompt eval time = 52374.28 ms / 49696 tokens ( 1.05 ms per token, 948.86 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 54015.45 ms / 49697 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q3_k_s.wng ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest))
4
+
5
+ Final Winogrande score(750 tasks): 63.0667 +/- 1.7635
6
+
7
+ llama_perf_context_print: load time = 929.52 ms
8
+ llama_perf_context_print: prompt eval time = 21714.40 ms / 21448 tokens ( 1.01 ms per token, 987.73 tokens per second)
9
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
10
+ llama_perf_context_print: total time = 22212.48 ms / 21449 tokens
11
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q4_k_m.arc ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 60.5333 +/- 1.7860
6
+ Random chance: 25.0083 +/- 1.5824
7
+
8
+
9
+ llama_perf_context_print: load time = 7096.93 ms
10
+ llama_perf_context_print: prompt eval time = 38331.71 ms / 35972 tokens ( 1.07 ms per token, 938.44 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 39294.96 ms / 35973 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q4_k_m.hsw ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest))
4
+
5
+ 750 71.46666667% [68.1319%, 74.5827%]
6
+
7
+
8
+ llama_perf_context_print: load time = 1212.85 ms
9
+ llama_perf_context_print: prompt eval time = 130689.40 ms / 126038 tokens ( 1.04 ms per token, 964.41 tokens per second)
10
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
11
+ llama_perf_context_print: total time = 134566.53 ms / 126039 tokens
12
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q4_k_m.mmlu ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 41.8667 +/- 1.8026
6
+ Random chance: 25.0000 +/- 1.5822
7
+
8
+
9
+ llama_perf_context_print: load time = 1263.34 ms
10
+ llama_perf_context_print: prompt eval time = 69966.59 ms / 67719 tokens ( 1.03 ms per token, 967.88 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 71361.75 ms / 67720 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q4_k_m.ppx ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ====== Perplexity statistics ======
2
+ Mean PPL(Q) : 58.664820 ± 0.740540
3
+ Mean PPL(base) : 8.445938 ± 0.065177
4
+ Cor(ln(PPL(Q)), ln(PPL(base))): 74.32%
5
+ Mean ln(PPL(Q)/PPL(base)) : 1.938155 ± 0.008608
6
+ Mean PPL(Q)/PPL(base) : 6.945921 ± 0.059790
7
+ Mean PPL(Q)-PPL(base) : 50.218882 ± 0.693471
8
+
9
+ ====== KL divergence statistics ======
10
+ Mean KLD: 1.826410 ± 0.006359
11
+ Maximum KLD: 38.350769
12
+ 99.9% KLD: 17.546576
13
+ 99.0% KLD: 12.219206
14
+ 99.0% KLD: 12.219206
15
+ Median KLD: 1.056963
16
+ 10.0% KLD: 0.011905
17
+ 5.0% KLD: 0.002018
18
+ 1.0% KLD: 0.000098
19
+ Minimum KLD: -0.000003
20
+
21
+ ====== Token probability statistics ======
22
+ Mean Δp: -9.585 ± 0.087 %
23
+ Maximum Δp: 99.633%
24
+ 99.9% Δp: 91.281%
25
+ 99.0% Δp: 72.420%
26
+ 95.0% Δp: 39.757%
27
+ 90.0% Δp: 19.570%
28
+ 75.0% Δp: 0.574%
29
+ Median Δp: -0.481%
30
+ 25.0% Δp: -15.306%
31
+ 10.0% Δp: -60.898%
32
+ 5.0% Δp: -90.180%
33
+ 1.0% Δp: -99.976%
34
+ 0.1% Δp: -100.000%
35
+ Minimum Δp: -100.000%
36
+ RMS Δp : 34.806 ± 0.092 %
37
+ Same top p: 58.557 ± 0.128 %
scores/Qwen3-30B-A3B-pruned-q4_k_m.tqa ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 30.9333 +/- 1.6889
6
+ Random chance: 19.8992 +/- 1.4588
7
+
8
+
9
+ llama_perf_context_print: load time = 1267.40 ms
10
+ llama_perf_context_print: prompt eval time = 53738.21 ms / 49696 tokens ( 1.08 ms per token, 924.78 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 55506.24 ms / 49697 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q4_k_m.wng ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest))
4
+
5
+ Final Winogrande score(750 tasks): 66.1333 +/- 1.7292
6
+
7
+ llama_perf_context_print: load time = 1232.55 ms
8
+ llama_perf_context_print: prompt eval time = 21981.09 ms / 21448 tokens ( 1.02 ms per token, 975.75 tokens per second)
9
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
10
+ llama_perf_context_print: total time = 22534.28 ms / 21449 tokens
11
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q4_k_s.arc ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_S.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 60.8000 +/- 1.7838
6
+ Random chance: 25.0083 +/- 1.5824
7
+
8
+
9
+ llama_perf_context_print: load time = 6986.96 ms
10
+ llama_perf_context_print: prompt eval time = 38183.57 ms / 35972 tokens ( 1.06 ms per token, 942.08 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 39137.36 ms / 35973 tokens
13
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q4_k_s.hsw ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_S.gguf (version GGUF V3 (latest))
4
+
5
+ 750 71.06666667% [67.7206%, 74.1981%]
6
+
7
+
8
+ llama_perf_context_print: load time = 1188.49 ms
9
+ llama_perf_context_print: prompt eval time = 130416.13 ms / 126038 tokens ( 1.03 ms per token, 966.43 tokens per second)
10
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
11
+ llama_perf_context_print: total time = 134331.92 ms / 126039 tokens
12
+ ggml_metal_free: deallocating
scores/Qwen3-30B-A3B-pruned-q4_k_s.mmlu ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
2
+ llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
3
+ llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_S.gguf (version GGUF V3 (latest))
4
+
5
+ Final result: 41.4667 +/- 1.8002
6
+ Random chance: 25.0000 +/- 1.5822
7
+
8
+
9
+ llama_perf_context_print: load time = 1227.63 ms
10
+ llama_perf_context_print: prompt eval time = 69732.87 ms / 67719 tokens ( 1.03 ms per token, 971.12 tokens per second)
11
+ llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
12
+ llama_perf_context_print: total time = 71119.80 ms / 67720 tokens
13
+ ggml_metal_free: deallocating