diff --git a/scores/Qwen3-30B-A3B-pruned-F16.arc b/scores/Qwen3-30B-A3B-pruned-F16.arc new file mode 100644 index 0000000000000000000000000000000000000000..fe7a2a82942ae72744e40211a5f109097ca14bd8 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-F16.arc @@ -0,0 +1,15 @@ +build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux +llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free +llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest)) + +Final result: 66.6667 +/- 1.7225 +Random chance: 25.0083 +/- 1.5824 + + +llama_perf_context_print: load time = 476545.17 ms +llama_perf_context_print: prompt eval time = 317100.07 ms / 35972 tokens ( 8.82 ms per token, 113.44 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 320554.96 ms / 35973 tokens diff --git a/scores/Qwen3-30B-A3B-pruned-F16.hsw b/scores/Qwen3-30B-A3B-pruned-F16.hsw new file mode 100644 index 0000000000000000000000000000000000000000..0da04d0d1bd7c34fa33fd62a68bb3741cfcc7c7f --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-F16.hsw @@ -0,0 +1,14 @@ +build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux +llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free +llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest)) + +750 72.66666667% [69.3676%, 75.7347%] + + +llama_perf_context_print: load time = 14042.88 ms +llama_perf_context_print: prompt eval time = 953982.56 ms / 123581 tokens ( 7.72 ms per token, 129.54 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 971012.88 ms / 123582 tokens diff --git a/scores/Qwen3-30B-A3B-pruned-F16.mmlu b/scores/Qwen3-30B-A3B-pruned-F16.mmlu new file mode 100644 index 0000000000000000000000000000000000000000..09f30cd6a8a2ac4fa5652f12afc107d61461d1ea --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-F16.mmlu @@ -0,0 +1,15 @@ +build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux +llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free +llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest)) + +Final result: 42.1333 +/- 1.8042 +Random chance: 25.0000 +/- 1.5822 + + +llama_perf_context_print: load time = 13886.47 ms +llama_perf_context_print: prompt eval time = 494837.42 ms / 67719 tokens ( 7.31 ms per token, 136.85 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 500285.67 ms / 67720 tokens diff --git a/scores/Qwen3-30B-A3B-pruned-F16.tqa b/scores/Qwen3-30B-A3B-pruned-F16.tqa new file mode 100644 index 0000000000000000000000000000000000000000..5a4672f9207bfad4503d2cb84145398f28997740 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-F16.tqa @@ -0,0 +1,15 @@ +build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux +llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free +llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest)) + +Final result: 31.2000 +/- 1.6929 +Random chance: 19.8992 +/- 1.4588 + + +llama_perf_context_print: load time = 13704.38 ms +llama_perf_context_print: prompt eval time = 426482.94 ms / 49696 tokens ( 8.58 ms per token, 116.53 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 433376.19 ms / 49697 tokens diff --git a/scores/Qwen3-30B-A3B-pruned-F16.wng b/scores/Qwen3-30B-A3B-pruned-F16.wng new file mode 100644 index 0000000000000000000000000000000000000000..a8fcacf75e8d6b026ff9f2602b6a973d20106bf9 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-F16.wng @@ -0,0 +1,13 @@ +build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux +llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free +llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free +llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest)) + +Final Winogrande score(750 tasks): 75.8667 +/- 1.5635 + +llama_perf_context_print: load time = 13885.91 ms +llama_perf_context_print: prompt eval time = 165214.42 ms / 21448 tokens ( 7.70 ms per token, 129.82 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 168672.50 ms / 21449 tokens diff --git a/scores/Qwen3-30B-A3B-pruned-iq3_m.arc b/scores/Qwen3-30B-A3B-pruned-iq3_m.arc new file mode 100644 index 0000000000000000000000000000000000000000..542c25206dc528ed01ae550c4acc6ae9dbc4eeae --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq3_m.arc @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest)) + +Final result: 56.8000 +/- 1.8100 +Random chance: 25.0083 +/- 1.5824 + + +llama_perf_context_print: load time = 5963.39 ms +llama_perf_context_print: prompt eval time = 37054.73 ms / 35972 tokens ( 1.03 ms per token, 970.78 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 37976.03 ms / 35973 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-iq3_m.hsw b/scores/Qwen3-30B-A3B-pruned-iq3_m.hsw new file mode 100644 index 0000000000000000000000000000000000000000..fe15362783e2410bfb279b3361d8bfd6104c62ea --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq3_m.hsw @@ -0,0 +1,12 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest)) + +750 70.26666667% [66.8989%, 73.4279%] + + +llama_perf_context_print: load time = 973.57 ms +llama_perf_context_print: prompt eval time = 124967.37 ms / 126038 tokens ( 0.99 ms per token, 1008.57 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 128697.47 ms / 126039 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-iq3_m.mmlu b/scores/Qwen3-30B-A3B-pruned-iq3_m.mmlu new file mode 100644 index 0000000000000000000000000000000000000000..5f3c9c35d4ff2ac2934b93820c7273b13b38f8c2 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq3_m.mmlu @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest)) + +Final result: 39.0667 +/- 1.7827 +Random chance: 25.0000 +/- 1.5822 + + +llama_perf_context_print: load time = 991.14 ms +llama_perf_context_print: prompt eval time = 66988.49 ms / 67719 tokens ( 0.99 ms per token, 1010.91 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 68293.40 ms / 67720 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-iq3_m.ppx b/scores/Qwen3-30B-A3B-pruned-iq3_m.ppx new file mode 100644 index 0000000000000000000000000000000000000000..f549d82a80b768aeb461a0e3f1c1cfd52adc0e29 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq3_m.ppx @@ -0,0 +1,37 @@ +====== Perplexity statistics ====== +Mean PPL(Q) : 77.090453 ± 1.044822 +Mean PPL(base) : 8.445938 ± 0.065177 +Cor(ln(PPL(Q)), ln(PPL(base))): 73.55% +Mean ln(PPL(Q)/PPL(base)) : 2.211294 ± 0.009454 +Mean PPL(Q)/PPL(base) : 9.127518 ± 0.086296 +Mean PPL(Q)-PPL(base) : 68.644515 ± 0.997862 + +====== KL divergence statistics ====== +Mean KLD: 2.063818 ± 0.006856 +Maximum KLD: 39.386982 +99.9% KLD: 19.179407 +99.0% KLD: 12.731147 +99.0% KLD: 12.731147 +Median KLD: 1.246560 +10.0% KLD: 0.011396 + 5.0% KLD: 0.001648 + 1.0% KLD: 0.000063 +Minimum KLD: -0.000003 + +====== Token probability statistics ====== +Mean Δp: -9.665 ± 0.088 % +Maximum Δp: 99.654% +99.9% Δp: 93.397% +99.0% Δp: 73.865% +95.0% Δp: 40.361% +90.0% Δp: 19.833% +75.0% Δp: 0.586% +Median Δp: -0.448% +25.0% Δp: -15.350% +10.0% Δp: -62.944% + 5.0% Δp: -90.459% + 1.0% Δp: -99.970% + 0.1% Δp: -100.000% +Minimum Δp: -100.000% +RMS Δp : 35.199 ± 0.092 % +Same top p: 57.360 ± 0.128 % diff --git a/scores/Qwen3-30B-A3B-pruned-iq3_m.tqa b/scores/Qwen3-30B-A3B-pruned-iq3_m.tqa new file mode 100644 index 0000000000000000000000000000000000000000..780d6a334d838e5ba58f9f5fc816bd08feb88d9f --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq3_m.tqa @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest)) + +Final result: 30.6667 +/- 1.6849 +Random chance: 19.8992 +/- 1.4588 + + +llama_perf_context_print: load time = 1061.94 ms +llama_perf_context_print: prompt eval time = 52253.67 ms / 49696 tokens ( 1.05 ms per token, 951.05 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 54015.85 ms / 49697 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-iq3_m.wng b/scores/Qwen3-30B-A3B-pruned-iq3_m.wng new file mode 100644 index 0000000000000000000000000000000000000000..a85d245779fd06571722b40fd871521267023f52 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq3_m.wng @@ -0,0 +1,11 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest)) + +Final Winogrande score(750 tasks): 62.5333 +/- 1.7686 + +llama_perf_context_print: load time = 1084.28 ms +llama_perf_context_print: prompt eval time = 21513.44 ms / 21448 tokens ( 1.00 ms per token, 996.96 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 22046.16 ms / 21449 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-iq3_s.arc b/scores/Qwen3-30B-A3B-pruned-iq3_s.arc new file mode 100644 index 0000000000000000000000000000000000000000..51547fe117b39226e0eb9fcaa21887d97b7748b4 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq3_s.arc @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest)) + +Final result: 48.5333 +/- 1.8262 +Random chance: 25.0083 +/- 1.5824 + + +llama_perf_context_print: load time = 5752.48 ms +llama_perf_context_print: prompt eval time = 37048.76 ms / 35972 tokens ( 1.03 ms per token, 970.94 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 37992.87 ms / 35973 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-iq3_s.hsw b/scores/Qwen3-30B-A3B-pruned-iq3_s.hsw new file mode 100644 index 0000000000000000000000000000000000000000..cacf94e0ef33eef6f0d1352920de50463e29eb61 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq3_s.hsw @@ -0,0 +1,12 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest)) + +750 68.66666667% [65.2590%, 71.8841%] + + +llama_perf_context_print: load time = 1004.53 ms +llama_perf_context_print: prompt eval time = 127701.50 ms / 126038 tokens ( 1.01 ms per token, 986.97 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 131559.01 ms / 126039 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-iq3_s.mmlu b/scores/Qwen3-30B-A3B-pruned-iq3_s.mmlu new file mode 100644 index 0000000000000000000000000000000000000000..ee320311f067ff5bc56a1c601be34dcb67f798e2 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq3_s.mmlu @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest)) + +Final result: 37.0667 +/- 1.7648 +Random chance: 25.0000 +/- 1.5822 + + +llama_perf_context_print: load time = 1051.40 ms +llama_perf_context_print: prompt eval time = 67708.86 ms / 67719 tokens ( 1.00 ms per token, 1000.15 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 69083.94 ms / 67720 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-iq3_s.ppx b/scores/Qwen3-30B-A3B-pruned-iq3_s.ppx new file mode 100644 index 0000000000000000000000000000000000000000..403b306b937b0ad52d6a45b85d262bc8ca0df5c6 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq3_s.ppx @@ -0,0 +1,37 @@ +====== Perplexity statistics ====== +Mean PPL(Q) : 69.935907 ± 0.918185 +Mean PPL(base) : 8.445938 ± 0.065177 +Cor(ln(PPL(Q)), ln(PPL(base))): 72.89% +Mean ln(PPL(Q)/PPL(base)) : 2.113894 ± 0.009177 +Mean PPL(Q)/PPL(base) : 8.280419 ± 0.075991 +Mean PPL(Q)-PPL(base) : 61.489969 ± 0.871820 + +====== KL divergence statistics ====== +Mean KLD: 1.997500 ± 0.006825 +Maximum KLD: 36.616871 +99.9% KLD: 18.998100 +99.0% KLD: 12.936396 +99.0% KLD: 12.936396 +Median KLD: 1.190034 +10.0% KLD: 0.013888 + 5.0% KLD: 0.002134 + 1.0% KLD: 0.000090 +Minimum KLD: -0.000004 + +====== Token probability statistics ====== +Mean Δp: -10.199 ± 0.088 % +Maximum Δp: 99.504% +99.9% Δp: 93.029% +99.0% Δp: 72.891% +95.0% Δp: 39.848% +90.0% Δp: 19.063% +75.0% Δp: 0.528% +Median Δp: -0.540% +25.0% Δp: -16.787% +10.0% Δp: -63.592% + 5.0% Δp: -91.195% + 1.0% Δp: -99.977% + 0.1% Δp: -100.000% +Minimum Δp: -100.000% +RMS Δp : 35.474 ± 0.092 % +Same top p: 56.834 ± 0.128 % diff --git a/scores/Qwen3-30B-A3B-pruned-iq3_s.tqa b/scores/Qwen3-30B-A3B-pruned-iq3_s.tqa new file mode 100644 index 0000000000000000000000000000000000000000..02e09a747b5b32154d0c888a545981ae498a3693 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq3_s.tqa @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest)) + +Final result: 32.0000 +/- 1.7045 +Random chance: 19.8992 +/- 1.4588 + + +llama_perf_context_print: load time = 1017.78 ms +llama_perf_context_print: prompt eval time = 51819.94 ms / 49696 tokens ( 1.04 ms per token, 959.01 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 53589.19 ms / 49697 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-iq3_s.wng b/scores/Qwen3-30B-A3B-pruned-iq3_s.wng new file mode 100644 index 0000000000000000000000000000000000000000..eeab3565306c3eac66560602f221a1911fbd563e --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq3_s.wng @@ -0,0 +1,11 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest)) + +Final Winogrande score(750 tasks): 63.8667 +/- 1.7553 + +llama_perf_context_print: load time = 1065.10 ms +llama_perf_context_print: prompt eval time = 21437.40 ms / 21448 tokens ( 1.00 ms per token, 1000.49 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 21986.82 ms / 21449 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-iq4_nl.arc b/scores/Qwen3-30B-A3B-pruned-iq4_nl.arc new file mode 100644 index 0000000000000000000000000000000000000000..256b785d9db827b0e179de632af01affdd0e3b02 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq4_nl.arc @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest)) + +Final result: 61.7333 +/- 1.7759 +Random chance: 25.0083 +/- 1.5824 + + +llama_perf_context_print: load time = 7153.23 ms +llama_perf_context_print: prompt eval time = 37237.77 ms / 35972 tokens ( 1.04 ms per token, 966.01 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 38188.71 ms / 35973 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-iq4_nl.hsw b/scores/Qwen3-30B-A3B-pruned-iq4_nl.hsw new file mode 100644 index 0000000000000000000000000000000000000000..f97051bb44ee52a41f5b66e5b020fc7b5de19de2 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq4_nl.hsw @@ -0,0 +1,12 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest)) + +750 71.20000000% [67.8576%, 74.3263%] + + +llama_perf_context_print: load time = 1206.30 ms +llama_perf_context_print: prompt eval time = 127295.75 ms / 126038 tokens ( 1.01 ms per token, 990.12 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 131179.47 ms / 126039 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-iq4_nl.mmlu b/scores/Qwen3-30B-A3B-pruned-iq4_nl.mmlu new file mode 100644 index 0000000000000000000000000000000000000000..9a1906e735e93cf67aa6b2d5a07c47d356bb2b6a --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq4_nl.mmlu @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest)) + +Final result: 40.9333 +/- 1.7967 +Random chance: 25.0000 +/- 1.5822 + + +llama_perf_context_print: load time = 1184.18 ms +llama_perf_context_print: prompt eval time = 68103.64 ms / 67719 tokens ( 1.01 ms per token, 994.35 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 69454.89 ms / 67720 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-iq4_nl.ppx b/scores/Qwen3-30B-A3B-pruned-iq4_nl.ppx new file mode 100644 index 0000000000000000000000000000000000000000..9b6d556ebf9fe069f049dd0aaaede4a0ca138606 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq4_nl.ppx @@ -0,0 +1,37 @@ +====== Perplexity statistics ====== +Mean PPL(Q) : 58.059268 ± 0.724129 +Mean PPL(base) : 8.445938 ± 0.065177 +Cor(ln(PPL(Q)), ln(PPL(base))): 73.87% +Mean ln(PPL(Q)/PPL(base)) : 1.927779 ± 0.008539 +Mean PPL(Q)/PPL(base) : 6.874224 ± 0.058701 +Mean PPL(Q)-PPL(base) : 49.613331 ± 0.677412 + +====== KL divergence statistics ====== +Mean KLD: 1.827625 ± 0.006356 +Maximum KLD: 37.203815 +99.9% KLD: 17.289213 +99.0% KLD: 12.241351 +99.0% KLD: 12.241351 +Median KLD: 1.062228 +10.0% KLD: 0.013747 + 5.0% KLD: 0.002407 + 1.0% KLD: 0.000120 +Minimum KLD: -0.000003 + +====== Token probability statistics ====== +Mean Δp: -10.074 ± 0.087 % +Maximum Δp: 99.662% +99.9% Δp: 90.888% +99.0% Δp: 70.751% +95.0% Δp: 38.384% +90.0% Δp: 18.656% +75.0% Δp: 0.543% +Median Δp: -0.549% +25.0% Δp: -15.860% +10.0% Δp: -62.678% + 5.0% Δp: -91.248% + 1.0% Δp: -99.979% + 0.1% Δp: -100.000% +Minimum Δp: -100.000% +RMS Δp : 35.013 ± 0.093 % +Same top p: 58.204 ± 0.128 % diff --git a/scores/Qwen3-30B-A3B-pruned-iq4_nl.tqa b/scores/Qwen3-30B-A3B-pruned-iq4_nl.tqa new file mode 100644 index 0000000000000000000000000000000000000000..13b1021f6ddb36a0473ca930a2eafc9755267f15 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq4_nl.tqa @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest)) + +Final result: 30.9333 +/- 1.6889 +Random chance: 19.8992 +/- 1.4588 + + +llama_perf_context_print: load time = 1254.92 ms +llama_perf_context_print: prompt eval time = 52206.26 ms / 49696 tokens ( 1.05 ms per token, 951.92 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 53979.12 ms / 49697 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-iq4_nl.wng b/scores/Qwen3-30B-A3B-pruned-iq4_nl.wng new file mode 100644 index 0000000000000000000000000000000000000000..41dd2b812b16929c7eee26ee5af6fe8f10a386fa --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-iq4_nl.wng @@ -0,0 +1,11 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest)) + +Final Winogrande score(750 tasks): 65.8667 +/- 1.7325 + +llama_perf_context_print: load time = 1229.62 ms +llama_perf_context_print: prompt eval time = 21328.35 ms / 21448 tokens ( 0.99 ms per token, 1005.61 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 21840.27 ms / 21449 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_l.arc b/scores/Qwen3-30B-A3B-pruned-q3_k_l.arc new file mode 100644 index 0000000000000000000000000000000000000000..1670d12ec92cb100e2fa1673a7f6202782dba497 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_l.arc @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest)) + +Final result: 57.8667 +/- 1.8042 +Random chance: 25.0083 +/- 1.5824 + + +llama_perf_context_print: load time = 5836.97 ms +llama_perf_context_print: prompt eval time = 38153.96 ms / 35972 tokens ( 1.06 ms per token, 942.81 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 39110.03 ms / 35973 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_l.hsw b/scores/Qwen3-30B-A3B-pruned-q3_k_l.hsw new file mode 100644 index 0000000000000000000000000000000000000000..bb70f0b52783b991d6a0b72296a18f2e26d84cc0 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_l.hsw @@ -0,0 +1,12 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest)) + +750 71.73333333% [68.4062%, 74.8389%] + + +llama_perf_context_print: load time = 1007.76 ms +llama_perf_context_print: prompt eval time = 130309.00 ms / 126038 tokens ( 1.03 ms per token, 967.22 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 134163.99 ms / 126039 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_l.mmlu b/scores/Qwen3-30B-A3B-pruned-q3_k_l.mmlu new file mode 100644 index 0000000000000000000000000000000000000000..3631d5d31c244723640671df2a4f04c4a7e6f9f0 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_l.mmlu @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest)) + +Final result: 38.6667 +/- 1.7794 +Random chance: 25.0000 +/- 1.5822 + + +llama_perf_context_print: load time = 1034.53 ms +llama_perf_context_print: prompt eval time = 69376.50 ms / 67719 tokens ( 1.02 ms per token, 976.11 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 70750.75 ms / 67720 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_l.ppx b/scores/Qwen3-30B-A3B-pruned-q3_k_l.ppx new file mode 100644 index 0000000000000000000000000000000000000000..235ae078bf16598598d90494b62863cae482531c --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_l.ppx @@ -0,0 +1,37 @@ +====== Perplexity statistics ====== +Mean PPL(Q) : 60.855606 ± 0.768774 +Mean PPL(base) : 8.445938 ± 0.065177 +Cor(ln(PPL(Q)), ln(PPL(base))): 73.47% +Mean ln(PPL(Q)/PPL(base)) : 1.974818 ± 0.008712 +Mean PPL(Q)/PPL(base) : 7.205311 ± 0.062773 +Mean PPL(Q)-PPL(base) : 52.409668 ± 0.722246 + +====== KL divergence statistics ====== +Mean KLD: 1.886749 ± 0.006413 +Maximum KLD: 32.243908 +99.9% KLD: 18.146301 +99.0% KLD: 12.053426 +99.0% KLD: 12.053426 +Median KLD: 1.116526 +10.0% KLD: 0.014264 + 5.0% KLD: 0.002443 + 1.0% KLD: 0.000117 +Minimum KLD: -0.000003 + +====== Token probability statistics ====== +Mean Δp: -10.112 ± 0.088 % +Maximum Δp: 99.677% +99.9% Δp: 92.123% +99.0% Δp: 72.105% +95.0% Δp: 39.096% +90.0% Δp: 19.004% +75.0% Δp: 0.549% +Median Δp: -0.551% +25.0% Δp: -16.110% +10.0% Δp: -63.590% + 5.0% Δp: -91.581% + 1.0% Δp: -99.973% + 0.1% Δp: -100.000% +Minimum Δp: -100.000% +RMS Δp : 35.285 ± 0.093 % +Same top p: 57.653 ± 0.128 % diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_l.tqa b/scores/Qwen3-30B-A3B-pruned-q3_k_l.tqa new file mode 100644 index 0000000000000000000000000000000000000000..b2af7639ded6316d51aab4747d599b25d83080b8 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_l.tqa @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest)) + +Final result: 32.2667 +/- 1.7082 +Random chance: 19.8992 +/- 1.4588 + + +llama_perf_context_print: load time = 1045.56 ms +llama_perf_context_print: prompt eval time = 53181.04 ms / 49696 tokens ( 1.07 ms per token, 934.47 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 54884.20 ms / 49697 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_l.wng b/scores/Qwen3-30B-A3B-pruned-q3_k_l.wng new file mode 100644 index 0000000000000000000000000000000000000000..5660d164a92f1885d09a725a45ea75df5325b782 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_l.wng @@ -0,0 +1,11 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest)) + +Final Winogrande score(750 tasks): 65.2000 +/- 1.7405 + +llama_perf_context_print: load time = 964.35 ms +llama_perf_context_print: prompt eval time = 21817.15 ms / 21448 tokens ( 1.02 ms per token, 983.08 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 22321.50 ms / 21449 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_m.arc b/scores/Qwen3-30B-A3B-pruned-q3_k_m.arc new file mode 100644 index 0000000000000000000000000000000000000000..953c7e27938416bb41ad65ee12848899146c7f91 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_m.arc @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest)) + +Final result: 56.4000 +/- 1.8119 +Random chance: 25.0083 +/- 1.5824 + + +llama_perf_context_print: load time = 5706.95 ms +llama_perf_context_print: prompt eval time = 37640.79 ms / 35972 tokens ( 1.05 ms per token, 955.67 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 38556.26 ms / 35973 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_m.hsw b/scores/Qwen3-30B-A3B-pruned-q3_k_m.hsw new file mode 100644 index 0000000000000000000000000000000000000000..8817a0d51b8f8602a985bae172923d39f24c5821 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_m.hsw @@ -0,0 +1,12 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest)) + +750 70.80000000% [67.4465%, 73.9415%] + + +llama_perf_context_print: load time = 924.40 ms +llama_perf_context_print: prompt eval time = 127735.77 ms / 126038 tokens ( 1.01 ms per token, 986.71 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 131427.60 ms / 126039 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_m.mmlu b/scores/Qwen3-30B-A3B-pruned-q3_k_m.mmlu new file mode 100644 index 0000000000000000000000000000000000000000..686a0e61f9e5332944162e553dc60cf823885786 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_m.mmlu @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest)) + +Final result: 39.3333 +/- 1.7849 +Random chance: 25.0000 +/- 1.5822 + + +llama_perf_context_print: load time = 943.54 ms +llama_perf_context_print: prompt eval time = 67380.31 ms / 67719 tokens ( 0.99 ms per token, 1005.03 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 68692.56 ms / 67720 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_m.ppx b/scores/Qwen3-30B-A3B-pruned-q3_k_m.ppx new file mode 100644 index 0000000000000000000000000000000000000000..d1a504100e955c8659a9fd053c066c6df1a53e5a --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_m.ppx @@ -0,0 +1,37 @@ +====== Perplexity statistics ====== +Mean PPL(Q) : 59.072808 ± 0.741897 +Mean PPL(base) : 8.445938 ± 0.065177 +Cor(ln(PPL(Q)), ln(PPL(base))): 73.82% +Mean ln(PPL(Q)/PPL(base)) : 1.945085 ± 0.008614 +Mean PPL(Q)/PPL(base) : 6.994227 ± 0.060246 +Mean PPL(Q)-PPL(base) : 50.626870 ± 0.695177 + +====== KL divergence statistics ====== +Mean KLD: 1.857932 ± 0.006326 +Maximum KLD: 31.393671 +99.9% KLD: 17.826597 +99.0% KLD: 11.908570 +99.0% KLD: 11.908570 +Median KLD: 1.097884 +10.0% KLD: 0.013517 + 5.0% KLD: 0.002297 + 1.0% KLD: 0.000108 +Minimum KLD: -0.000003 + +====== Token probability statistics ====== +Mean Δp: -9.998 ± 0.087 % +Maximum Δp: 99.678% +99.9% Δp: 91.515% +99.0% Δp: 72.068% +95.0% Δp: 39.410% +90.0% Δp: 18.817% +75.0% Δp: 0.535% +Median Δp: -0.541% +25.0% Δp: -15.834% +10.0% Δp: -62.929% + 5.0% Δp: -91.189% + 1.0% Δp: -99.971% + 0.1% Δp: -99.999% +Minimum Δp: -100.000% +RMS Δp : 35.157 ± 0.092 % +Same top p: 57.800 ± 0.128 % diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_m.tqa b/scores/Qwen3-30B-A3B-pruned-q3_k_m.tqa new file mode 100644 index 0000000000000000000000000000000000000000..1f1ee3bb4a538715888ada48f6c688992cfecab4 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_m.tqa @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest)) + +Final result: 31.6000 +/- 1.6988 +Random chance: 19.8992 +/- 1.4588 + + +llama_perf_context_print: load time = 973.04 ms +llama_perf_context_print: prompt eval time = 52005.69 ms / 49696 tokens ( 1.05 ms per token, 955.59 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 53651.35 ms / 49697 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_m.wng b/scores/Qwen3-30B-A3B-pruned-q3_k_m.wng new file mode 100644 index 0000000000000000000000000000000000000000..8a717876bc2fe926105219e7776254b0415faf3f --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_m.wng @@ -0,0 +1,11 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest)) + +Final Winogrande score(750 tasks): 64.8000 +/- 1.7451 + +llama_perf_context_print: load time = 1010.06 ms +llama_perf_context_print: prompt eval time = 21536.50 ms / 21448 tokens ( 1.00 ms per token, 995.89 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 22054.60 ms / 21449 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_s.arc b/scores/Qwen3-30B-A3B-pruned-q3_k_s.arc new file mode 100644 index 0000000000000000000000000000000000000000..842b9f5c3816e2569eda7195ae7f452fb368f243 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_s.arc @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest)) + +Final result: 58.1333 +/- 1.8026 +Random chance: 25.0083 +/- 1.5824 + + +llama_perf_context_print: load time = 5783.27 ms +llama_perf_context_print: prompt eval time = 37846.85 ms / 35972 tokens ( 1.05 ms per token, 950.46 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 38757.96 ms / 35973 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_s.hsw b/scores/Qwen3-30B-A3B-pruned-q3_k_s.hsw new file mode 100644 index 0000000000000000000000000000000000000000..80f769c9908e1863389ef31d107fe63498484254 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_s.hsw @@ -0,0 +1,12 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest)) + +750 71.46666667% [68.1319%, 74.5827%] + + +llama_perf_context_print: load time = 885.76 ms +llama_perf_context_print: prompt eval time = 127239.40 ms / 126038 tokens ( 1.01 ms per token, 990.56 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 130949.53 ms / 126039 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_s.mmlu b/scores/Qwen3-30B-A3B-pruned-q3_k_s.mmlu new file mode 100644 index 0000000000000000000000000000000000000000..8ab6a417c0b492586b74e654c0a5798d551c1cfd --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_s.mmlu @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest)) + +Final result: 38.9333 +/- 1.7816 +Random chance: 25.0000 +/- 1.5822 + + +llama_perf_context_print: load time = 972.00 ms +llama_perf_context_print: prompt eval time = 67683.27 ms / 67719 tokens ( 1.00 ms per token, 1000.53 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 68987.39 ms / 67720 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_s.ppx b/scores/Qwen3-30B-A3B-pruned-q3_k_s.ppx new file mode 100644 index 0000000000000000000000000000000000000000..81271ec4a8c851ca8d986ac1991ad30e82f30641 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_s.ppx @@ -0,0 +1,37 @@ +====== Perplexity statistics ====== +Mean PPL(Q) : 61.676169 ± 0.780539 +Mean PPL(base) : 8.445938 ± 0.065177 +Cor(ln(PPL(Q)), ln(PPL(base))): 73.64% +Mean ln(PPL(Q)/PPL(base)) : 1.988212 ± 0.008711 +Mean PPL(Q)/PPL(base) : 7.302465 ± 0.063613 +Mean PPL(Q)-PPL(base) : 53.230232 ± 0.733873 + +====== KL divergence statistics ====== +Mean KLD: 1.888847 ± 0.006380 +Maximum KLD: 33.008038 +99.9% KLD: 17.721254 +99.0% KLD: 12.006232 +99.0% KLD: 12.006232 +Median KLD: 1.128817 +10.0% KLD: 0.013821 + 5.0% KLD: 0.002327 + 1.0% KLD: 0.000107 +Minimum KLD: -0.000003 + +====== Token probability statistics ====== +Mean Δp: -10.182 ± 0.088 % +Maximum Δp: 99.676% +99.9% Δp: 91.955% +99.0% Δp: 72.330% +95.0% Δp: 39.211% +90.0% Δp: 18.768% +75.0% Δp: 0.478% +Median Δp: -0.585% +25.0% Δp: -16.265% +10.0% Δp: -63.264% + 5.0% Δp: -91.364% + 1.0% Δp: -99.971% + 0.1% Δp: -99.999% +Minimum Δp: -100.000% +RMS Δp : 35.283 ± 0.093 % +Same top p: 57.426 ± 0.128 % diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_s.tqa b/scores/Qwen3-30B-A3B-pruned-q3_k_s.tqa new file mode 100644 index 0000000000000000000000000000000000000000..a972ed86ed28f7c388ee3fe86585cb75bfe33f8c --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_s.tqa @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest)) + +Final result: 30.9333 +/- 1.6889 +Random chance: 19.8992 +/- 1.4588 + + +llama_perf_context_print: load time = 988.65 ms +llama_perf_context_print: prompt eval time = 52374.28 ms / 49696 tokens ( 1.05 ms per token, 948.86 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 54015.45 ms / 49697 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q3_k_s.wng b/scores/Qwen3-30B-A3B-pruned-q3_k_s.wng new file mode 100644 index 0000000000000000000000000000000000000000..a282648e6efe8ae6839dd926e60c4164f0d3ec83 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q3_k_s.wng @@ -0,0 +1,11 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest)) + +Final Winogrande score(750 tasks): 63.0667 +/- 1.7635 + +llama_perf_context_print: load time = 929.52 ms +llama_perf_context_print: prompt eval time = 21714.40 ms / 21448 tokens ( 1.01 ms per token, 987.73 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 22212.48 ms / 21449 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q4_k_m.arc b/scores/Qwen3-30B-A3B-pruned-q4_k_m.arc new file mode 100644 index 0000000000000000000000000000000000000000..f41c20895572828e082f37cf26ff1b114c00204f --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q4_k_m.arc @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest)) + +Final result: 60.5333 +/- 1.7860 +Random chance: 25.0083 +/- 1.5824 + + +llama_perf_context_print: load time = 7096.93 ms +llama_perf_context_print: prompt eval time = 38331.71 ms / 35972 tokens ( 1.07 ms per token, 938.44 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 39294.96 ms / 35973 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q4_k_m.hsw b/scores/Qwen3-30B-A3B-pruned-q4_k_m.hsw new file mode 100644 index 0000000000000000000000000000000000000000..a5d40975603e288c6e6a6978869004da2a151c51 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q4_k_m.hsw @@ -0,0 +1,12 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest)) + +750 71.46666667% [68.1319%, 74.5827%] + + +llama_perf_context_print: load time = 1212.85 ms +llama_perf_context_print: prompt eval time = 130689.40 ms / 126038 tokens ( 1.04 ms per token, 964.41 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 134566.53 ms / 126039 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q4_k_m.mmlu b/scores/Qwen3-30B-A3B-pruned-q4_k_m.mmlu new file mode 100644 index 0000000000000000000000000000000000000000..eff39ec403db6b7a94a13531ccd83353f021da26 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q4_k_m.mmlu @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest)) + +Final result: 41.8667 +/- 1.8026 +Random chance: 25.0000 +/- 1.5822 + + +llama_perf_context_print: load time = 1263.34 ms +llama_perf_context_print: prompt eval time = 69966.59 ms / 67719 tokens ( 1.03 ms per token, 967.88 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 71361.75 ms / 67720 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q4_k_m.ppx b/scores/Qwen3-30B-A3B-pruned-q4_k_m.ppx new file mode 100644 index 0000000000000000000000000000000000000000..d8b0177e3eabe4940ae68ed0e4634ea0d770f3c3 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q4_k_m.ppx @@ -0,0 +1,37 @@ +====== Perplexity statistics ====== +Mean PPL(Q) : 58.664820 ± 0.740540 +Mean PPL(base) : 8.445938 ± 0.065177 +Cor(ln(PPL(Q)), ln(PPL(base))): 74.32% +Mean ln(PPL(Q)/PPL(base)) : 1.938155 ± 0.008608 +Mean PPL(Q)/PPL(base) : 6.945921 ± 0.059790 +Mean PPL(Q)-PPL(base) : 50.218882 ± 0.693471 + +====== KL divergence statistics ====== +Mean KLD: 1.826410 ± 0.006359 +Maximum KLD: 38.350769 +99.9% KLD: 17.546576 +99.0% KLD: 12.219206 +99.0% KLD: 12.219206 +Median KLD: 1.056963 +10.0% KLD: 0.011905 + 5.0% KLD: 0.002018 + 1.0% KLD: 0.000098 +Minimum KLD: -0.000003 + +====== Token probability statistics ====== +Mean Δp: -9.585 ± 0.087 % +Maximum Δp: 99.633% +99.9% Δp: 91.281% +99.0% Δp: 72.420% +95.0% Δp: 39.757% +90.0% Δp: 19.570% +75.0% Δp: 0.574% +Median Δp: -0.481% +25.0% Δp: -15.306% +10.0% Δp: -60.898% + 5.0% Δp: -90.180% + 1.0% Δp: -99.976% + 0.1% Δp: -100.000% +Minimum Δp: -100.000% +RMS Δp : 34.806 ± 0.092 % +Same top p: 58.557 ± 0.128 % diff --git a/scores/Qwen3-30B-A3B-pruned-q4_k_m.tqa b/scores/Qwen3-30B-A3B-pruned-q4_k_m.tqa new file mode 100644 index 0000000000000000000000000000000000000000..32d84d163cb9bb8f8347c18f6e641b22752f4918 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q4_k_m.tqa @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest)) + +Final result: 30.9333 +/- 1.6889 +Random chance: 19.8992 +/- 1.4588 + + +llama_perf_context_print: load time = 1267.40 ms +llama_perf_context_print: prompt eval time = 53738.21 ms / 49696 tokens ( 1.08 ms per token, 924.78 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 55506.24 ms / 49697 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q4_k_m.wng b/scores/Qwen3-30B-A3B-pruned-q4_k_m.wng new file mode 100644 index 0000000000000000000000000000000000000000..1cc7d8217263713e67f2f5a283ac6d57bdd37d09 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q4_k_m.wng @@ -0,0 +1,11 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest)) + +Final Winogrande score(750 tasks): 66.1333 +/- 1.7292 + +llama_perf_context_print: load time = 1232.55 ms +llama_perf_context_print: prompt eval time = 21981.09 ms / 21448 tokens ( 1.02 ms per token, 975.75 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 22534.28 ms / 21449 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q4_k_s.arc b/scores/Qwen3-30B-A3B-pruned-q4_k_s.arc new file mode 100644 index 0000000000000000000000000000000000000000..d9f4c162ea428a59df46e544597a6e49b5685cb9 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q4_k_s.arc @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_S.gguf (version GGUF V3 (latest)) + +Final result: 60.8000 +/- 1.7838 +Random chance: 25.0083 +/- 1.5824 + + +llama_perf_context_print: load time = 6986.96 ms +llama_perf_context_print: prompt eval time = 38183.57 ms / 35972 tokens ( 1.06 ms per token, 942.08 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 39137.36 ms / 35973 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q4_k_s.hsw b/scores/Qwen3-30B-A3B-pruned-q4_k_s.hsw new file mode 100644 index 0000000000000000000000000000000000000000..91793768db654057eede4010d6bfb5b0104af28d --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q4_k_s.hsw @@ -0,0 +1,12 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_S.gguf (version GGUF V3 (latest)) + +750 71.06666667% [67.7206%, 74.1981%] + + +llama_perf_context_print: load time = 1188.49 ms +llama_perf_context_print: prompt eval time = 130416.13 ms / 126038 tokens ( 1.03 ms per token, 966.43 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 134331.92 ms / 126039 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q4_k_s.mmlu b/scores/Qwen3-30B-A3B-pruned-q4_k_s.mmlu new file mode 100644 index 0000000000000000000000000000000000000000..b347487e3adf7f977bc8779028097bfe0b557637 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q4_k_s.mmlu @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_S.gguf (version GGUF V3 (latest)) + +Final result: 41.4667 +/- 1.8002 +Random chance: 25.0000 +/- 1.5822 + + +llama_perf_context_print: load time = 1227.63 ms +llama_perf_context_print: prompt eval time = 69732.87 ms / 67719 tokens ( 1.03 ms per token, 971.12 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 71119.80 ms / 67720 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q4_k_s.ppx b/scores/Qwen3-30B-A3B-pruned-q4_k_s.ppx new file mode 100644 index 0000000000000000000000000000000000000000..698ed298b372b19405ea8a61e56cc5b8ff6cfbdf --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q4_k_s.ppx @@ -0,0 +1,37 @@ +====== Perplexity statistics ====== +Mean PPL(Q) : 57.925186 ± 0.730332 +Mean PPL(base) : 8.445938 ± 0.065177 +Cor(ln(PPL(Q)), ln(PPL(base))): 74.48% +Mean ln(PPL(Q)/PPL(base)) : 1.925467 ± 0.008578 +Mean PPL(Q)/PPL(base) : 6.858349 ± 0.058829 +Mean PPL(Q)-PPL(base) : 49.479248 ± 0.683172 + +====== KL divergence statistics ====== +Mean KLD: 1.813341 ± 0.006327 +Maximum KLD: 36.167244 +99.9% KLD: 17.588381 +99.0% KLD: 12.150424 +99.0% KLD: 12.150424 +Median KLD: 1.048804 +10.0% KLD: 0.011787 + 5.0% KLD: 0.001938 + 1.0% KLD: 0.000095 +Minimum KLD: -0.000004 + +====== Token probability statistics ====== +Mean Δp: -9.451 ± 0.086 % +Maximum Δp: 99.650% +99.9% Δp: 91.300% +99.0% Δp: 72.230% +95.0% Δp: 39.873% +90.0% Δp: 19.644% +75.0% Δp: 0.627% +Median Δp: -0.454% +25.0% Δp: -15.081% +10.0% Δp: -60.525% + 5.0% Δp: -89.841% + 1.0% Δp: -99.976% + 0.1% Δp: -100.000% +Minimum Δp: -100.000% +RMS Δp : 34.669 ± 0.092 % +Same top p: 58.661 ± 0.128 % diff --git a/scores/Qwen3-30B-A3B-pruned-q4_k_s.tqa b/scores/Qwen3-30B-A3B-pruned-q4_k_s.tqa new file mode 100644 index 0000000000000000000000000000000000000000..85afb0345cbbf880e6d846b25d3efa73dd097c28 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q4_k_s.tqa @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_S.gguf (version GGUF V3 (latest)) + +Final result: 31.0667 +/- 1.6909 +Random chance: 19.8992 +/- 1.4588 + + +llama_perf_context_print: load time = 1232.15 ms +llama_perf_context_print: prompt eval time = 53582.86 ms / 49696 tokens ( 1.08 ms per token, 927.46 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 55308.07 ms / 49697 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q4_k_s.wng b/scores/Qwen3-30B-A3B-pruned-q4_k_s.wng new file mode 100644 index 0000000000000000000000000000000000000000..2166d038e1a43eeaaf2a84f405f521a02ee228b3 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q4_k_s.wng @@ -0,0 +1,11 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_S.gguf (version GGUF V3 (latest)) + +Final Winogrande score(750 tasks): 66.4000 +/- 1.7259 + +llama_perf_context_print: load time = 1221.50 ms +llama_perf_context_print: prompt eval time = 21798.81 ms / 21448 tokens ( 1.02 ms per token, 983.91 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 22310.05 ms / 21449 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q5_k_m.arc b/scores/Qwen3-30B-A3B-pruned-q5_k_m.arc new file mode 100644 index 0000000000000000000000000000000000000000..95b709b69a42e785b12beff3a9b10ec346cd3e66 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q5_k_m.arc @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q5_K_M.gguf (version GGUF V3 (latest)) + +Final result: 60.4000 +/- 1.7870 +Random chance: 25.0083 +/- 1.5824 + + +llama_perf_context_print: load time = 8694.59 ms +llama_perf_context_print: prompt eval time = 38438.76 ms / 35972 tokens ( 1.07 ms per token, 935.83 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 39352.80 ms / 35973 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q5_k_m.hsw b/scores/Qwen3-30B-A3B-pruned-q5_k_m.hsw new file mode 100644 index 0000000000000000000000000000000000000000..7645cb784ca64475c5f5f13f5618525bedfb4ec1 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q5_k_m.hsw @@ -0,0 +1,12 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q5_K_M.gguf (version GGUF V3 (latest)) + +750 70.53333333% [67.1726%, 73.6848%] + + +llama_perf_context_print: load time = 1343.10 ms +llama_perf_context_print: prompt eval time = 130424.78 ms / 126038 tokens ( 1.03 ms per token, 966.37 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 134153.23 ms / 126039 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q5_k_m.mmlu b/scores/Qwen3-30B-A3B-pruned-q5_k_m.mmlu new file mode 100644 index 0000000000000000000000000000000000000000..774ddd1d4a9c8920414f314e4aa33be7440306e2 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q5_k_m.mmlu @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q5_K_M.gguf (version GGUF V3 (latest)) + +Final result: 41.4667 +/- 1.8002 +Random chance: 25.0000 +/- 1.5822 + + +llama_perf_context_print: load time = 1344.05 ms +llama_perf_context_print: prompt eval time = 69940.17 ms / 67719 tokens ( 1.03 ms per token, 968.24 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 71315.46 ms / 67720 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q5_k_m.ppx b/scores/Qwen3-30B-A3B-pruned-q5_k_m.ppx new file mode 100644 index 0000000000000000000000000000000000000000..644afc1ae7cf633c9e4e5f55b90aa15a04b6b92f --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q5_k_m.ppx @@ -0,0 +1,37 @@ +====== Perplexity statistics ====== +Mean PPL(Q) : 57.654440 ± 0.728232 +Mean PPL(base) : 8.445938 ± 0.065177 +Cor(ln(PPL(Q)), ln(PPL(base))): 74.66% +Mean ln(PPL(Q)/PPL(base)) : 1.920782 ± 0.008575 +Mean PPL(Q)/PPL(base) : 6.826292 ± 0.058538 +Mean PPL(Q)-PPL(base) : 49.208502 ± 0.680950 + +====== KL divergence statistics ====== +Mean KLD: 1.794232 ± 0.006308 +Maximum KLD: 35.415676 +99.9% KLD: 17.817822 +99.0% KLD: 12.097032 +99.0% KLD: 12.097032 +Median KLD: 1.028153 +10.0% KLD: 0.012323 + 5.0% KLD: 0.002062 + 1.0% KLD: 0.000091 +Minimum KLD: -0.000004 + +====== Token probability statistics ====== +Mean Δp: -9.293 ± 0.086 % +Maximum Δp: 99.651% +99.9% Δp: 90.927% +99.0% Δp: 71.009% +95.0% Δp: 39.216% +90.0% Δp: 19.661% +75.0% Δp: 0.725% +Median Δp: -0.426% +25.0% Δp: -14.393% +10.0% Δp: -60.223% + 5.0% Δp: -89.795% + 1.0% Δp: -99.975% + 0.1% Δp: -100.000% +Minimum Δp: -100.000% +RMS Δp : 34.464 ± 0.092 % +Same top p: 59.358 ± 0.127 % diff --git a/scores/Qwen3-30B-A3B-pruned-q5_k_m.tqa b/scores/Qwen3-30B-A3B-pruned-q5_k_m.tqa new file mode 100644 index 0000000000000000000000000000000000000000..d51afc7b025543871199f902f2afe47d1be3aba9 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q5_k_m.tqa @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q5_K_M.gguf (version GGUF V3 (latest)) + +Final result: 30.9333 +/- 1.6889 +Random chance: 19.8992 +/- 1.4588 + + +llama_perf_context_print: load time = 1421.82 ms +llama_perf_context_print: prompt eval time = 54192.50 ms / 49696 tokens ( 1.09 ms per token, 917.03 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 55944.31 ms / 49697 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q5_k_m.wng b/scores/Qwen3-30B-A3B-pruned-q5_k_m.wng new file mode 100644 index 0000000000000000000000000000000000000000..15840d2ca958b9f3d669f46ef1b935b39b17743e --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q5_k_m.wng @@ -0,0 +1,11 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q5_K_M.gguf (version GGUF V3 (latest)) + +Final Winogrande score(750 tasks): 65.3333 +/- 1.7389 + +llama_perf_context_print: load time = 1480.16 ms +llama_perf_context_print: prompt eval time = 21866.13 ms / 21448 tokens ( 1.02 ms per token, 980.88 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 22403.06 ms / 21449 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q5_k_s.arc b/scores/Qwen3-30B-A3B-pruned-q5_k_s.arc new file mode 100644 index 0000000000000000000000000000000000000000..21b88b58609645f6c83439cfe2283f349148aa96 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q5_k_s.arc @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q5_K_S.gguf (version GGUF V3 (latest)) + +Final result: 61.0667 +/- 1.7816 +Random chance: 25.0083 +/- 1.5824 + + +llama_perf_context_print: load time = 8464.02 ms +llama_perf_context_print: prompt eval time = 38133.91 ms / 35972 tokens ( 1.06 ms per token, 943.31 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 39109.74 ms / 35973 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q5_k_s.hsw b/scores/Qwen3-30B-A3B-pruned-q5_k_s.hsw new file mode 100644 index 0000000000000000000000000000000000000000..2aac500a393a5056b00689b78fdd75adee3049bd --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q5_k_s.hsw @@ -0,0 +1,12 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q5_K_S.gguf (version GGUF V3 (latest)) + +750 70.93333333% [67.5835%, 74.0698%] + + +llama_perf_context_print: load time = 1411.83 ms +llama_perf_context_print: prompt eval time = 131359.53 ms / 126038 tokens ( 1.04 ms per token, 959.49 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 135268.22 ms / 126039 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q5_k_s.mmlu b/scores/Qwen3-30B-A3B-pruned-q5_k_s.mmlu new file mode 100644 index 0000000000000000000000000000000000000000..a08da823ecf8a08b7f1d7355e2a932e35b4e88eb --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q5_k_s.mmlu @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q5_K_S.gguf (version GGUF V3 (latest)) + +Final result: 42.0000 +/- 1.8034 +Random chance: 25.0000 +/- 1.5822 + + +llama_perf_context_print: load time = 1458.56 ms +llama_perf_context_print: prompt eval time = 70011.58 ms / 67719 tokens ( 1.03 ms per token, 967.25 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 71401.61 ms / 67720 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q5_k_s.ppx b/scores/Qwen3-30B-A3B-pruned-q5_k_s.ppx new file mode 100644 index 0000000000000000000000000000000000000000..b5192e8ae15853fc137a1c7849623b49b913083f --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q5_k_s.ppx @@ -0,0 +1,37 @@ +====== Perplexity statistics ====== +Mean PPL(Q) : 57.254362 ± 0.721067 +Mean PPL(base) : 8.445938 ± 0.065177 +Cor(ln(PPL(Q)), ln(PPL(base))): 74.56% +Mean ln(PPL(Q)/PPL(base)) : 1.913818 ± 0.008558 +Mean PPL(Q)/PPL(base) : 6.778923 ± 0.058011 +Mean PPL(Q)-PPL(base) : 48.808424 ± 0.673871 + +====== KL divergence statistics ====== +Mean KLD: 1.792982 ± 0.006300 +Maximum KLD: 36.765514 +99.9% KLD: 17.909147 +99.0% KLD: 12.072874 +99.0% KLD: 12.072874 +Median KLD: 1.026656 +10.0% KLD: 0.012494 + 5.0% KLD: 0.002115 + 1.0% KLD: 0.000097 +Minimum KLD: -0.000004 + +====== Token probability statistics ====== +Mean Δp: -9.384 ± 0.086 % +Maximum Δp: 99.647% +99.9% Δp: 90.769% +99.0% Δp: 70.837% +95.0% Δp: 39.084% +90.0% Δp: 19.621% +75.0% Δp: 0.705% +Median Δp: -0.440% +25.0% Δp: -14.590% +10.0% Δp: -60.605% + 5.0% Δp: -90.004% + 1.0% Δp: -99.974% + 0.1% Δp: -100.000% +Minimum Δp: -100.000% +RMS Δp : 34.527 ± 0.092 % +Same top p: 59.266 ± 0.127 % diff --git a/scores/Qwen3-30B-A3B-pruned-q5_k_s.tqa b/scores/Qwen3-30B-A3B-pruned-q5_k_s.tqa new file mode 100644 index 0000000000000000000000000000000000000000..6fe6c348fa3b444b5bde8b051b2a5186bcbadee9 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q5_k_s.tqa @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q5_K_S.gguf (version GGUF V3 (latest)) + +Final result: 30.6667 +/- 1.6849 +Random chance: 19.8992 +/- 1.4588 + + +llama_perf_context_print: load time = 1396.19 ms +llama_perf_context_print: prompt eval time = 53824.73 ms / 49696 tokens ( 1.08 ms per token, 923.29 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 55589.75 ms / 49697 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q5_k_s.wng b/scores/Qwen3-30B-A3B-pruned-q5_k_s.wng new file mode 100644 index 0000000000000000000000000000000000000000..f90bce51432e2ffc09978d4989afc105999663a4 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q5_k_s.wng @@ -0,0 +1,11 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q5_K_S.gguf (version GGUF V3 (latest)) + +Final Winogrande score(750 tasks): 67.3333 +/- 1.7137 + +llama_perf_context_print: load time = 1423.55 ms +llama_perf_context_print: prompt eval time = 21853.38 ms / 21448 tokens ( 1.02 ms per token, 981.45 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 22389.39 ms / 21449 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q6_k.arc b/scores/Qwen3-30B-A3B-pruned-q6_k.arc new file mode 100644 index 0000000000000000000000000000000000000000..99bfcb3685b436e6e2f56ce256ff91417897982a --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q6_k.arc @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q6_K.gguf (version GGUF V3 (latest)) + +Final result: 60.8000 +/- 1.7838 +Random chance: 25.0083 +/- 1.5824 + + +llama_perf_context_print: load time = 12824.65 ms +llama_perf_context_print: prompt eval time = 39177.83 ms / 35972 tokens ( 1.09 ms per token, 918.17 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 40138.47 ms / 35973 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q6_k.hsw b/scores/Qwen3-30B-A3B-pruned-q6_k.hsw new file mode 100644 index 0000000000000000000000000000000000000000..914223ef39dcfa85aec02c0f8e9987db9db0a35a --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q6_k.hsw @@ -0,0 +1,12 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q6_K.gguf (version GGUF V3 (latest)) + +750 71.60000000% [68.2690%, 74.7108%] + + +llama_perf_context_print: load time = 1738.03 ms +llama_perf_context_print: prompt eval time = 134280.81 ms / 126038 tokens ( 1.07 ms per token, 938.62 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 138179.60 ms / 126039 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q6_k.mmlu b/scores/Qwen3-30B-A3B-pruned-q6_k.mmlu new file mode 100644 index 0000000000000000000000000000000000000000..9c312cc152b29a615f75c7855f337a3511e23a4f --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q6_k.mmlu @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q6_K.gguf (version GGUF V3 (latest)) + +Final result: 42.0000 +/- 1.8034 +Random chance: 25.0000 +/- 1.5822 + + +llama_perf_context_print: load time = 1754.84 ms +llama_perf_context_print: prompt eval time = 70616.62 ms / 67719 tokens ( 1.04 ms per token, 958.97 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 71993.80 ms / 67720 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q6_k.ppx b/scores/Qwen3-30B-A3B-pruned-q6_k.ppx new file mode 100644 index 0000000000000000000000000000000000000000..bcfbdc1443a74901b8d8ae731b18d3c38da35aab --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q6_k.ppx @@ -0,0 +1,37 @@ +====== Perplexity statistics ====== +Mean PPL(Q) : 56.914839 ± 0.717086 +Mean PPL(base) : 8.445938 ± 0.065177 +Cor(ln(PPL(Q)), ln(PPL(base))): 74.60% +Mean ln(PPL(Q)/PPL(base)) : 1.907871 ± 0.008558 +Mean PPL(Q)/PPL(base) : 6.738723 ± 0.057668 +Mean PPL(Q)-PPL(base) : 48.468902 ± 0.669874 + +====== KL divergence statistics ====== +Mean KLD: 1.786199 ± 0.006314 +Maximum KLD: 37.578331 +99.9% KLD: 17.887403 +99.0% KLD: 12.155305 +99.0% KLD: 12.155305 +Median KLD: 1.019602 +10.0% KLD: 0.012315 + 5.0% KLD: 0.002119 + 1.0% KLD: 0.000094 +Minimum KLD: -0.000004 + +====== Token probability statistics ====== +Mean Δp: -9.302 ± 0.086 % +Maximum Δp: 99.667% +99.9% Δp: 90.791% +99.0% Δp: 71.028% +95.0% Δp: 39.052% +90.0% Δp: 19.656% +75.0% Δp: 0.727% +Median Δp: -0.431% +25.0% Δp: -14.360% +10.0% Δp: -60.277% + 5.0% Δp: -90.063% + 1.0% Δp: -99.976% + 0.1% Δp: -100.000% +Minimum Δp: -100.000% +RMS Δp : 34.465 ± 0.092 % +Same top p: 59.533 ± 0.127 % diff --git a/scores/Qwen3-30B-A3B-pruned-q6_k.tqa b/scores/Qwen3-30B-A3B-pruned-q6_k.tqa new file mode 100644 index 0000000000000000000000000000000000000000..7112955c10ac732892ce0cb412210d2b137e5267 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q6_k.tqa @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q6_K.gguf (version GGUF V3 (latest)) + +Final result: 30.9333 +/- 1.6889 +Random chance: 19.8992 +/- 1.4588 + + +llama_perf_context_print: load time = 1620.83 ms +llama_perf_context_print: prompt eval time = 54357.78 ms / 49696 tokens ( 1.09 ms per token, 914.24 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 56036.96 ms / 49697 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q6_k.wng b/scores/Qwen3-30B-A3B-pruned-q6_k.wng new file mode 100644 index 0000000000000000000000000000000000000000..a2f5c4bae1f523183ac186517a3fd39408d95d19 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q6_k.wng @@ -0,0 +1,11 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q6_K.gguf (version GGUF V3 (latest)) + +Final Winogrande score(750 tasks): 66.2667 +/- 1.7276 + +llama_perf_context_print: load time = 1591.13 ms +llama_perf_context_print: prompt eval time = 22068.38 ms / 21448 tokens ( 1.03 ms per token, 971.89 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 22566.73 ms / 21449 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q8_0.arc b/scores/Qwen3-30B-A3B-pruned-q8_0.arc new file mode 100644 index 0000000000000000000000000000000000000000..0c1acec1153ce09ff4bc69511c33a1da7a0d9acf --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q8_0.arc @@ -0,0 +1,15 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q8_0.gguf (version GGUF V3 (latest)) + +750 61.20000000 + +Final result: 61.2000 +/- 1.7805 +Random chance: 25.0083 +/- 1.5824 + + +llama_perf_context_print: load time = 13846.02 ms +llama_perf_context_print: prompt eval time = 37604.69 ms / 35972 tokens ( 1.05 ms per token, 956.58 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 38593.90 ms / 35973 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q8_0.hsw b/scores/Qwen3-30B-A3B-pruned-q8_0.hsw new file mode 100644 index 0000000000000000000000000000000000000000..9c53c8ed4b5a2cd649d7da45670863fc41269740 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q8_0.hsw @@ -0,0 +1,12 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q8_0.gguf (version GGUF V3 (latest)) + +750 71.33333333% [67.9947%, 74.4545%] + + +llama_perf_context_print: load time = 1970.97 ms +llama_perf_context_print: prompt eval time = 127378.35 ms / 126038 tokens ( 1.01 ms per token, 989.48 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 131314.88 ms / 126039 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q8_0.mmlu b/scores/Qwen3-30B-A3B-pruned-q8_0.mmlu new file mode 100644 index 0000000000000000000000000000000000000000..81e227eb8c0a9c6b2cfd9a61b063e6b74f790165 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q8_0.mmlu @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q8_0.gguf (version GGUF V3 (latest)) + +Final result: 42.1333 +/- 1.8042 +Random chance: 25.0000 +/- 1.5822 + + +llama_perf_context_print: load time = 1919.97 ms +llama_perf_context_print: prompt eval time = 67529.28 ms / 67719 tokens ( 1.00 ms per token, 1002.81 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 69015.48 ms / 67720 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q8_0.ppx b/scores/Qwen3-30B-A3B-pruned-q8_0.ppx new file mode 100644 index 0000000000000000000000000000000000000000..516157e025139dcf2d24dd6a747ea6b5256dffe8 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q8_0.ppx @@ -0,0 +1,37 @@ +====== Perplexity statistics ====== +Mean PPL(Q) : 57.426860 ± 0.724407 +Mean PPL(base) : 8.445938 ± 0.065177 +Cor(ln(PPL(Q)), ln(PPL(base))): 74.54% +Mean ln(PPL(Q)/PPL(base)) : 1.916827 ± 0.008577 +Mean PPL(Q)/PPL(base) : 6.799347 ± 0.058317 +Mean PPL(Q)-PPL(base) : 48.980922 ± 0.677222 + +====== KL divergence statistics ====== +Mean KLD: 1.794528 ± 0.006338 +Maximum KLD: 39.240364 +99.9% KLD: 17.831783 +99.0% KLD: 12.152828 +99.0% KLD: 12.152828 +Median KLD: 1.022714 +10.0% KLD: 0.012492 + 5.0% KLD: 0.002121 + 1.0% KLD: 0.000096 +Minimum KLD: -0.000002 + +====== Token probability statistics ====== +Mean Δp: -9.336 ± 0.086 % +Maximum Δp: 99.667% +99.9% Δp: 90.881% +99.0% Δp: 71.054% +95.0% Δp: 39.332% +90.0% Δp: 19.703% +75.0% Δp: 0.724% +Median Δp: -0.437% +25.0% Δp: -14.427% +10.0% Δp: -60.453% + 5.0% Δp: -90.206% + 1.0% Δp: -99.977% + 0.1% Δp: -100.000% +Minimum Δp: -100.000% +RMS Δp : 34.514 ± 0.092 % +Same top p: 59.478 ± 0.127 % diff --git a/scores/Qwen3-30B-A3B-pruned-q8_0.tqa b/scores/Qwen3-30B-A3B-pruned-q8_0.tqa new file mode 100644 index 0000000000000000000000000000000000000000..6cb62cfccfa69bf93004255f114a5b37b1c8a787 --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q8_0.tqa @@ -0,0 +1,13 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q8_0.gguf (version GGUF V3 (latest)) + +Final result: 30.2667 +/- 1.6787 +Random chance: 19.8992 +/- 1.4588 + + +llama_perf_context_print: load time = 1999.83 ms +llama_perf_context_print: prompt eval time = 52150.18 ms / 49696 tokens ( 1.05 ms per token, 952.94 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 53892.58 ms / 49697 tokens +ggml_metal_free: deallocating diff --git a/scores/Qwen3-30B-A3B-pruned-q8_0.wng b/scores/Qwen3-30B-A3B-pruned-q8_0.wng new file mode 100644 index 0000000000000000000000000000000000000000..bcd1931e16a7f25f8540276c7e5d76c8f308d8fd --- /dev/null +++ b/scores/Qwen3-30B-A3B-pruned-q8_0.wng @@ -0,0 +1,11 @@ +build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0 +llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free +llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q8_0.gguf (version GGUF V3 (latest)) + +Final Winogrande score(750 tasks): 66.1333 +/- 1.7292 + +llama_perf_context_print: load time = 2016.13 ms +llama_perf_context_print: prompt eval time = 20970.54 ms / 21448 tokens ( 0.98 ms per token, 1022.77 tokens per second) +llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second) +llama_perf_context_print: total time = 21479.59 ms / 21449 tokens +ggml_metal_free: deallocating