Generate Perplexity, KLD, ARC, HellaSwag, MMLU, Truthful QA and WinoGrande scores
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- scores/Qwen3-30B-A3B-pruned-F16.arc +15 -0
- scores/Qwen3-30B-A3B-pruned-F16.hsw +14 -0
- scores/Qwen3-30B-A3B-pruned-F16.mmlu +15 -0
- scores/Qwen3-30B-A3B-pruned-F16.tqa +15 -0
- scores/Qwen3-30B-A3B-pruned-F16.wng +13 -0
- scores/Qwen3-30B-A3B-pruned-iq3_m.arc +13 -0
- scores/Qwen3-30B-A3B-pruned-iq3_m.hsw +12 -0
- scores/Qwen3-30B-A3B-pruned-iq3_m.mmlu +13 -0
- scores/Qwen3-30B-A3B-pruned-iq3_m.ppx +37 -0
- scores/Qwen3-30B-A3B-pruned-iq3_m.tqa +13 -0
- scores/Qwen3-30B-A3B-pruned-iq3_m.wng +11 -0
- scores/Qwen3-30B-A3B-pruned-iq3_s.arc +13 -0
- scores/Qwen3-30B-A3B-pruned-iq3_s.hsw +12 -0
- scores/Qwen3-30B-A3B-pruned-iq3_s.mmlu +13 -0
- scores/Qwen3-30B-A3B-pruned-iq3_s.ppx +37 -0
- scores/Qwen3-30B-A3B-pruned-iq3_s.tqa +13 -0
- scores/Qwen3-30B-A3B-pruned-iq3_s.wng +11 -0
- scores/Qwen3-30B-A3B-pruned-iq4_nl.arc +13 -0
- scores/Qwen3-30B-A3B-pruned-iq4_nl.hsw +12 -0
- scores/Qwen3-30B-A3B-pruned-iq4_nl.mmlu +13 -0
- scores/Qwen3-30B-A3B-pruned-iq4_nl.ppx +37 -0
- scores/Qwen3-30B-A3B-pruned-iq4_nl.tqa +13 -0
- scores/Qwen3-30B-A3B-pruned-iq4_nl.wng +11 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_l.arc +13 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_l.hsw +12 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_l.mmlu +13 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_l.ppx +37 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_l.tqa +13 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_l.wng +11 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_m.arc +13 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_m.hsw +12 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_m.mmlu +13 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_m.ppx +37 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_m.tqa +13 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_m.wng +11 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_s.arc +13 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_s.hsw +12 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_s.mmlu +13 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_s.ppx +37 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_s.tqa +13 -0
- scores/Qwen3-30B-A3B-pruned-q3_k_s.wng +11 -0
- scores/Qwen3-30B-A3B-pruned-q4_k_m.arc +13 -0
- scores/Qwen3-30B-A3B-pruned-q4_k_m.hsw +12 -0
- scores/Qwen3-30B-A3B-pruned-q4_k_m.mmlu +13 -0
- scores/Qwen3-30B-A3B-pruned-q4_k_m.ppx +37 -0
- scores/Qwen3-30B-A3B-pruned-q4_k_m.tqa +13 -0
- scores/Qwen3-30B-A3B-pruned-q4_k_m.wng +11 -0
- scores/Qwen3-30B-A3B-pruned-q4_k_s.arc +13 -0
- scores/Qwen3-30B-A3B-pruned-q4_k_s.hsw +12 -0
- scores/Qwen3-30B-A3B-pruned-q4_k_s.mmlu +13 -0
scores/Qwen3-30B-A3B-pruned-F16.arc
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux
|
2 |
+
llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free
|
3 |
+
llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free
|
4 |
+
llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free
|
5 |
+
llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free
|
6 |
+
llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest))
|
7 |
+
|
8 |
+
Final result: 66.6667 +/- 1.7225
|
9 |
+
Random chance: 25.0083 +/- 1.5824
|
10 |
+
|
11 |
+
|
12 |
+
llama_perf_context_print: load time = 476545.17 ms
|
13 |
+
llama_perf_context_print: prompt eval time = 317100.07 ms / 35972 tokens ( 8.82 ms per token, 113.44 tokens per second)
|
14 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
15 |
+
llama_perf_context_print: total time = 320554.96 ms / 35973 tokens
|
scores/Qwen3-30B-A3B-pruned-F16.hsw
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux
|
2 |
+
llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free
|
3 |
+
llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free
|
4 |
+
llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free
|
5 |
+
llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free
|
6 |
+
llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest))
|
7 |
+
|
8 |
+
750 72.66666667% [69.3676%, 75.7347%]
|
9 |
+
|
10 |
+
|
11 |
+
llama_perf_context_print: load time = 14042.88 ms
|
12 |
+
llama_perf_context_print: prompt eval time = 953982.56 ms / 123581 tokens ( 7.72 ms per token, 129.54 tokens per second)
|
13 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
14 |
+
llama_perf_context_print: total time = 971012.88 ms / 123582 tokens
|
scores/Qwen3-30B-A3B-pruned-F16.mmlu
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux
|
2 |
+
llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free
|
3 |
+
llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free
|
4 |
+
llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free
|
5 |
+
llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free
|
6 |
+
llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest))
|
7 |
+
|
8 |
+
Final result: 42.1333 +/- 1.8042
|
9 |
+
Random chance: 25.0000 +/- 1.5822
|
10 |
+
|
11 |
+
|
12 |
+
llama_perf_context_print: load time = 13886.47 ms
|
13 |
+
llama_perf_context_print: prompt eval time = 494837.42 ms / 67719 tokens ( 7.31 ms per token, 136.85 tokens per second)
|
14 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
15 |
+
llama_perf_context_print: total time = 500285.67 ms / 67720 tokens
|
scores/Qwen3-30B-A3B-pruned-F16.tqa
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux
|
2 |
+
llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free
|
3 |
+
llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free
|
4 |
+
llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free
|
5 |
+
llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free
|
6 |
+
llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest))
|
7 |
+
|
8 |
+
Final result: 31.2000 +/- 1.6929
|
9 |
+
Random chance: 19.8992 +/- 1.4588
|
10 |
+
|
11 |
+
|
12 |
+
llama_perf_context_print: load time = 13704.38 ms
|
13 |
+
llama_perf_context_print: prompt eval time = 426482.94 ms / 49696 tokens ( 8.58 ms per token, 116.53 tokens per second)
|
14 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
15 |
+
llama_perf_context_print: total time = 433376.19 ms / 49697 tokens
|
scores/Qwen3-30B-A3B-pruned-F16.wng
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5553 (c7e0a205) with cc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5) for x86_64-amazon-linux
|
2 |
+
llama_model_load_from_file_impl: using device CUDA0 (Tesla T4) - 14810 MiB free
|
3 |
+
llama_model_load_from_file_impl: using device CUDA1 (Tesla T4) - 14810 MiB free
|
4 |
+
llama_model_load_from_file_impl: using device CUDA2 (Tesla T4) - 14810 MiB free
|
5 |
+
llama_model_load_from_file_impl: using device CUDA3 (Tesla T4) - 14810 MiB free
|
6 |
+
llama_model_loader: loaded meta data with 40 key-value pairs and 579 tensors from ./Qwen3-30B-A3B-F16.gguf (version GGUF V3 (latest))
|
7 |
+
|
8 |
+
Final Winogrande score(750 tasks): 75.8667 +/- 1.5635
|
9 |
+
|
10 |
+
llama_perf_context_print: load time = 13885.91 ms
|
11 |
+
llama_perf_context_print: prompt eval time = 165214.42 ms / 21448 tokens ( 7.70 ms per token, 129.82 tokens per second)
|
12 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
13 |
+
llama_perf_context_print: total time = 168672.50 ms / 21449 tokens
|
scores/Qwen3-30B-A3B-pruned-iq3_m.arc
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 56.8000 +/- 1.8100
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 5963.39 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 37054.73 ms / 35972 tokens ( 1.03 ms per token, 970.78 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 37976.03 ms / 35973 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-iq3_m.hsw
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
750 70.26666667% [66.8989%, 73.4279%]
|
6 |
+
|
7 |
+
|
8 |
+
llama_perf_context_print: load time = 973.57 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 124967.37 ms / 126038 tokens ( 0.99 ms per token, 1008.57 tokens per second)
|
10 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 128697.47 ms / 126039 tokens
|
12 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-iq3_m.mmlu
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 39.0667 +/- 1.7827
|
6 |
+
Random chance: 25.0000 +/- 1.5822
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 991.14 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 66988.49 ms / 67719 tokens ( 0.99 ms per token, 1010.91 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 68293.40 ms / 67720 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-iq3_m.ppx
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
====== Perplexity statistics ======
|
2 |
+
Mean PPL(Q) : 77.090453 ± 1.044822
|
3 |
+
Mean PPL(base) : 8.445938 ± 0.065177
|
4 |
+
Cor(ln(PPL(Q)), ln(PPL(base))): 73.55%
|
5 |
+
Mean ln(PPL(Q)/PPL(base)) : 2.211294 ± 0.009454
|
6 |
+
Mean PPL(Q)/PPL(base) : 9.127518 ± 0.086296
|
7 |
+
Mean PPL(Q)-PPL(base) : 68.644515 ± 0.997862
|
8 |
+
|
9 |
+
====== KL divergence statistics ======
|
10 |
+
Mean KLD: 2.063818 ± 0.006856
|
11 |
+
Maximum KLD: 39.386982
|
12 |
+
99.9% KLD: 19.179407
|
13 |
+
99.0% KLD: 12.731147
|
14 |
+
99.0% KLD: 12.731147
|
15 |
+
Median KLD: 1.246560
|
16 |
+
10.0% KLD: 0.011396
|
17 |
+
5.0% KLD: 0.001648
|
18 |
+
1.0% KLD: 0.000063
|
19 |
+
Minimum KLD: -0.000003
|
20 |
+
|
21 |
+
====== Token probability statistics ======
|
22 |
+
Mean Δp: -9.665 ± 0.088 %
|
23 |
+
Maximum Δp: 99.654%
|
24 |
+
99.9% Δp: 93.397%
|
25 |
+
99.0% Δp: 73.865%
|
26 |
+
95.0% Δp: 40.361%
|
27 |
+
90.0% Δp: 19.833%
|
28 |
+
75.0% Δp: 0.586%
|
29 |
+
Median Δp: -0.448%
|
30 |
+
25.0% Δp: -15.350%
|
31 |
+
10.0% Δp: -62.944%
|
32 |
+
5.0% Δp: -90.459%
|
33 |
+
1.0% Δp: -99.970%
|
34 |
+
0.1% Δp: -100.000%
|
35 |
+
Minimum Δp: -100.000%
|
36 |
+
RMS Δp : 35.199 ± 0.092 %
|
37 |
+
Same top p: 57.360 ± 0.128 %
|
scores/Qwen3-30B-A3B-pruned-iq3_m.tqa
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 30.6667 +/- 1.6849
|
6 |
+
Random chance: 19.8992 +/- 1.4588
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 1061.94 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 52253.67 ms / 49696 tokens ( 1.05 ms per token, 951.05 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 54015.85 ms / 49697 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-iq3_m.wng
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final Winogrande score(750 tasks): 62.5333 +/- 1.7686
|
6 |
+
|
7 |
+
llama_perf_context_print: load time = 1084.28 ms
|
8 |
+
llama_perf_context_print: prompt eval time = 21513.44 ms / 21448 tokens ( 1.00 ms per token, 996.96 tokens per second)
|
9 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
+
llama_perf_context_print: total time = 22046.16 ms / 21449 tokens
|
11 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-iq3_s.arc
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 48.5333 +/- 1.8262
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 5752.48 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 37048.76 ms / 35972 tokens ( 1.03 ms per token, 970.94 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 37992.87 ms / 35973 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-iq3_s.hsw
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
750 68.66666667% [65.2590%, 71.8841%]
|
6 |
+
|
7 |
+
|
8 |
+
llama_perf_context_print: load time = 1004.53 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 127701.50 ms / 126038 tokens ( 1.01 ms per token, 986.97 tokens per second)
|
10 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 131559.01 ms / 126039 tokens
|
12 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-iq3_s.mmlu
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 37.0667 +/- 1.7648
|
6 |
+
Random chance: 25.0000 +/- 1.5822
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 1051.40 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 67708.86 ms / 67719 tokens ( 1.00 ms per token, 1000.15 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 69083.94 ms / 67720 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-iq3_s.ppx
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
====== Perplexity statistics ======
|
2 |
+
Mean PPL(Q) : 69.935907 ± 0.918185
|
3 |
+
Mean PPL(base) : 8.445938 ± 0.065177
|
4 |
+
Cor(ln(PPL(Q)), ln(PPL(base))): 72.89%
|
5 |
+
Mean ln(PPL(Q)/PPL(base)) : 2.113894 ± 0.009177
|
6 |
+
Mean PPL(Q)/PPL(base) : 8.280419 ± 0.075991
|
7 |
+
Mean PPL(Q)-PPL(base) : 61.489969 ± 0.871820
|
8 |
+
|
9 |
+
====== KL divergence statistics ======
|
10 |
+
Mean KLD: 1.997500 ± 0.006825
|
11 |
+
Maximum KLD: 36.616871
|
12 |
+
99.9% KLD: 18.998100
|
13 |
+
99.0% KLD: 12.936396
|
14 |
+
99.0% KLD: 12.936396
|
15 |
+
Median KLD: 1.190034
|
16 |
+
10.0% KLD: 0.013888
|
17 |
+
5.0% KLD: 0.002134
|
18 |
+
1.0% KLD: 0.000090
|
19 |
+
Minimum KLD: -0.000004
|
20 |
+
|
21 |
+
====== Token probability statistics ======
|
22 |
+
Mean Δp: -10.199 ± 0.088 %
|
23 |
+
Maximum Δp: 99.504%
|
24 |
+
99.9% Δp: 93.029%
|
25 |
+
99.0% Δp: 72.891%
|
26 |
+
95.0% Δp: 39.848%
|
27 |
+
90.0% Δp: 19.063%
|
28 |
+
75.0% Δp: 0.528%
|
29 |
+
Median Δp: -0.540%
|
30 |
+
25.0% Δp: -16.787%
|
31 |
+
10.0% Δp: -63.592%
|
32 |
+
5.0% Δp: -91.195%
|
33 |
+
1.0% Δp: -99.977%
|
34 |
+
0.1% Δp: -100.000%
|
35 |
+
Minimum Δp: -100.000%
|
36 |
+
RMS Δp : 35.474 ± 0.092 %
|
37 |
+
Same top p: 56.834 ± 0.128 %
|
scores/Qwen3-30B-A3B-pruned-iq3_s.tqa
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 32.0000 +/- 1.7045
|
6 |
+
Random chance: 19.8992 +/- 1.4588
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 1017.78 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 51819.94 ms / 49696 tokens ( 1.04 ms per token, 959.01 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 53589.19 ms / 49697 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-iq3_s.wng
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final Winogrande score(750 tasks): 63.8667 +/- 1.7553
|
6 |
+
|
7 |
+
llama_perf_context_print: load time = 1065.10 ms
|
8 |
+
llama_perf_context_print: prompt eval time = 21437.40 ms / 21448 tokens ( 1.00 ms per token, 1000.49 tokens per second)
|
9 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
+
llama_perf_context_print: total time = 21986.82 ms / 21449 tokens
|
11 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-iq4_nl.arc
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 61.7333 +/- 1.7759
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 7153.23 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 37237.77 ms / 35972 tokens ( 1.04 ms per token, 966.01 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 38188.71 ms / 35973 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-iq4_nl.hsw
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
750 71.20000000% [67.8576%, 74.3263%]
|
6 |
+
|
7 |
+
|
8 |
+
llama_perf_context_print: load time = 1206.30 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 127295.75 ms / 126038 tokens ( 1.01 ms per token, 990.12 tokens per second)
|
10 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 131179.47 ms / 126039 tokens
|
12 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-iq4_nl.mmlu
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 40.9333 +/- 1.7967
|
6 |
+
Random chance: 25.0000 +/- 1.5822
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 1184.18 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 68103.64 ms / 67719 tokens ( 1.01 ms per token, 994.35 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 69454.89 ms / 67720 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-iq4_nl.ppx
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
====== Perplexity statistics ======
|
2 |
+
Mean PPL(Q) : 58.059268 ± 0.724129
|
3 |
+
Mean PPL(base) : 8.445938 ± 0.065177
|
4 |
+
Cor(ln(PPL(Q)), ln(PPL(base))): 73.87%
|
5 |
+
Mean ln(PPL(Q)/PPL(base)) : 1.927779 ± 0.008539
|
6 |
+
Mean PPL(Q)/PPL(base) : 6.874224 ± 0.058701
|
7 |
+
Mean PPL(Q)-PPL(base) : 49.613331 ± 0.677412
|
8 |
+
|
9 |
+
====== KL divergence statistics ======
|
10 |
+
Mean KLD: 1.827625 ± 0.006356
|
11 |
+
Maximum KLD: 37.203815
|
12 |
+
99.9% KLD: 17.289213
|
13 |
+
99.0% KLD: 12.241351
|
14 |
+
99.0% KLD: 12.241351
|
15 |
+
Median KLD: 1.062228
|
16 |
+
10.0% KLD: 0.013747
|
17 |
+
5.0% KLD: 0.002407
|
18 |
+
1.0% KLD: 0.000120
|
19 |
+
Minimum KLD: -0.000003
|
20 |
+
|
21 |
+
====== Token probability statistics ======
|
22 |
+
Mean Δp: -10.074 ± 0.087 %
|
23 |
+
Maximum Δp: 99.662%
|
24 |
+
99.9% Δp: 90.888%
|
25 |
+
99.0% Δp: 70.751%
|
26 |
+
95.0% Δp: 38.384%
|
27 |
+
90.0% Δp: 18.656%
|
28 |
+
75.0% Δp: 0.543%
|
29 |
+
Median Δp: -0.549%
|
30 |
+
25.0% Δp: -15.860%
|
31 |
+
10.0% Δp: -62.678%
|
32 |
+
5.0% Δp: -91.248%
|
33 |
+
1.0% Δp: -99.979%
|
34 |
+
0.1% Δp: -100.000%
|
35 |
+
Minimum Δp: -100.000%
|
36 |
+
RMS Δp : 35.013 ± 0.093 %
|
37 |
+
Same top p: 58.204 ± 0.128 %
|
scores/Qwen3-30B-A3B-pruned-iq4_nl.tqa
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 30.9333 +/- 1.6889
|
6 |
+
Random chance: 19.8992 +/- 1.4588
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 1254.92 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 52206.26 ms / 49696 tokens ( 1.05 ms per token, 951.92 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 53979.12 ms / 49697 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-iq4_nl.wng
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final Winogrande score(750 tasks): 65.8667 +/- 1.7325
|
6 |
+
|
7 |
+
llama_perf_context_print: load time = 1229.62 ms
|
8 |
+
llama_perf_context_print: prompt eval time = 21328.35 ms / 21448 tokens ( 0.99 ms per token, 1005.61 tokens per second)
|
9 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
+
llama_perf_context_print: total time = 21840.27 ms / 21449 tokens
|
11 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_l.arc
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 57.8667 +/- 1.8042
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 5836.97 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 38153.96 ms / 35972 tokens ( 1.06 ms per token, 942.81 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 39110.03 ms / 35973 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_l.hsw
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
750 71.73333333% [68.4062%, 74.8389%]
|
6 |
+
|
7 |
+
|
8 |
+
llama_perf_context_print: load time = 1007.76 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 130309.00 ms / 126038 tokens ( 1.03 ms per token, 967.22 tokens per second)
|
10 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 134163.99 ms / 126039 tokens
|
12 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_l.mmlu
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 38.6667 +/- 1.7794
|
6 |
+
Random chance: 25.0000 +/- 1.5822
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 1034.53 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 69376.50 ms / 67719 tokens ( 1.02 ms per token, 976.11 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 70750.75 ms / 67720 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_l.ppx
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
====== Perplexity statistics ======
|
2 |
+
Mean PPL(Q) : 60.855606 ± 0.768774
|
3 |
+
Mean PPL(base) : 8.445938 ± 0.065177
|
4 |
+
Cor(ln(PPL(Q)), ln(PPL(base))): 73.47%
|
5 |
+
Mean ln(PPL(Q)/PPL(base)) : 1.974818 ± 0.008712
|
6 |
+
Mean PPL(Q)/PPL(base) : 7.205311 ± 0.062773
|
7 |
+
Mean PPL(Q)-PPL(base) : 52.409668 ± 0.722246
|
8 |
+
|
9 |
+
====== KL divergence statistics ======
|
10 |
+
Mean KLD: 1.886749 ± 0.006413
|
11 |
+
Maximum KLD: 32.243908
|
12 |
+
99.9% KLD: 18.146301
|
13 |
+
99.0% KLD: 12.053426
|
14 |
+
99.0% KLD: 12.053426
|
15 |
+
Median KLD: 1.116526
|
16 |
+
10.0% KLD: 0.014264
|
17 |
+
5.0% KLD: 0.002443
|
18 |
+
1.0% KLD: 0.000117
|
19 |
+
Minimum KLD: -0.000003
|
20 |
+
|
21 |
+
====== Token probability statistics ======
|
22 |
+
Mean Δp: -10.112 ± 0.088 %
|
23 |
+
Maximum Δp: 99.677%
|
24 |
+
99.9% Δp: 92.123%
|
25 |
+
99.0% Δp: 72.105%
|
26 |
+
95.0% Δp: 39.096%
|
27 |
+
90.0% Δp: 19.004%
|
28 |
+
75.0% Δp: 0.549%
|
29 |
+
Median Δp: -0.551%
|
30 |
+
25.0% Δp: -16.110%
|
31 |
+
10.0% Δp: -63.590%
|
32 |
+
5.0% Δp: -91.581%
|
33 |
+
1.0% Δp: -99.973%
|
34 |
+
0.1% Δp: -100.000%
|
35 |
+
Minimum Δp: -100.000%
|
36 |
+
RMS Δp : 35.285 ± 0.093 %
|
37 |
+
Same top p: 57.653 ± 0.128 %
|
scores/Qwen3-30B-A3B-pruned-q3_k_l.tqa
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 32.2667 +/- 1.7082
|
6 |
+
Random chance: 19.8992 +/- 1.4588
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 1045.56 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 53181.04 ms / 49696 tokens ( 1.07 ms per token, 934.47 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 54884.20 ms / 49697 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_l.wng
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final Winogrande score(750 tasks): 65.2000 +/- 1.7405
|
6 |
+
|
7 |
+
llama_perf_context_print: load time = 964.35 ms
|
8 |
+
llama_perf_context_print: prompt eval time = 21817.15 ms / 21448 tokens ( 1.02 ms per token, 983.08 tokens per second)
|
9 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
+
llama_perf_context_print: total time = 22321.50 ms / 21449 tokens
|
11 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_m.arc
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 56.4000 +/- 1.8119
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 5706.95 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 37640.79 ms / 35972 tokens ( 1.05 ms per token, 955.67 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 38556.26 ms / 35973 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_m.hsw
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
750 70.80000000% [67.4465%, 73.9415%]
|
6 |
+
|
7 |
+
|
8 |
+
llama_perf_context_print: load time = 924.40 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 127735.77 ms / 126038 tokens ( 1.01 ms per token, 986.71 tokens per second)
|
10 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 131427.60 ms / 126039 tokens
|
12 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_m.mmlu
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 39.3333 +/- 1.7849
|
6 |
+
Random chance: 25.0000 +/- 1.5822
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 943.54 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 67380.31 ms / 67719 tokens ( 0.99 ms per token, 1005.03 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 68692.56 ms / 67720 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_m.ppx
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
====== Perplexity statistics ======
|
2 |
+
Mean PPL(Q) : 59.072808 ± 0.741897
|
3 |
+
Mean PPL(base) : 8.445938 ± 0.065177
|
4 |
+
Cor(ln(PPL(Q)), ln(PPL(base))): 73.82%
|
5 |
+
Mean ln(PPL(Q)/PPL(base)) : 1.945085 ± 0.008614
|
6 |
+
Mean PPL(Q)/PPL(base) : 6.994227 ± 0.060246
|
7 |
+
Mean PPL(Q)-PPL(base) : 50.626870 ± 0.695177
|
8 |
+
|
9 |
+
====== KL divergence statistics ======
|
10 |
+
Mean KLD: 1.857932 ± 0.006326
|
11 |
+
Maximum KLD: 31.393671
|
12 |
+
99.9% KLD: 17.826597
|
13 |
+
99.0% KLD: 11.908570
|
14 |
+
99.0% KLD: 11.908570
|
15 |
+
Median KLD: 1.097884
|
16 |
+
10.0% KLD: 0.013517
|
17 |
+
5.0% KLD: 0.002297
|
18 |
+
1.0% KLD: 0.000108
|
19 |
+
Minimum KLD: -0.000003
|
20 |
+
|
21 |
+
====== Token probability statistics ======
|
22 |
+
Mean Δp: -9.998 ± 0.087 %
|
23 |
+
Maximum Δp: 99.678%
|
24 |
+
99.9% Δp: 91.515%
|
25 |
+
99.0% Δp: 72.068%
|
26 |
+
95.0% Δp: 39.410%
|
27 |
+
90.0% Δp: 18.817%
|
28 |
+
75.0% Δp: 0.535%
|
29 |
+
Median Δp: -0.541%
|
30 |
+
25.0% Δp: -15.834%
|
31 |
+
10.0% Δp: -62.929%
|
32 |
+
5.0% Δp: -91.189%
|
33 |
+
1.0% Δp: -99.971%
|
34 |
+
0.1% Δp: -99.999%
|
35 |
+
Minimum Δp: -100.000%
|
36 |
+
RMS Δp : 35.157 ± 0.092 %
|
37 |
+
Same top p: 57.800 ± 0.128 %
|
scores/Qwen3-30B-A3B-pruned-q3_k_m.tqa
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 31.6000 +/- 1.6988
|
6 |
+
Random chance: 19.8992 +/- 1.4588
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 973.04 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 52005.69 ms / 49696 tokens ( 1.05 ms per token, 955.59 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 53651.35 ms / 49697 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_m.wng
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final Winogrande score(750 tasks): 64.8000 +/- 1.7451
|
6 |
+
|
7 |
+
llama_perf_context_print: load time = 1010.06 ms
|
8 |
+
llama_perf_context_print: prompt eval time = 21536.50 ms / 21448 tokens ( 1.00 ms per token, 995.89 tokens per second)
|
9 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
+
llama_perf_context_print: total time = 22054.60 ms / 21449 tokens
|
11 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_s.arc
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 58.1333 +/- 1.8026
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 5783.27 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 37846.85 ms / 35972 tokens ( 1.05 ms per token, 950.46 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 38757.96 ms / 35973 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_s.hsw
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
750 71.46666667% [68.1319%, 74.5827%]
|
6 |
+
|
7 |
+
|
8 |
+
llama_perf_context_print: load time = 885.76 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 127239.40 ms / 126038 tokens ( 1.01 ms per token, 990.56 tokens per second)
|
10 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 130949.53 ms / 126039 tokens
|
12 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_s.mmlu
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 38.9333 +/- 1.7816
|
6 |
+
Random chance: 25.0000 +/- 1.5822
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 972.00 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 67683.27 ms / 67719 tokens ( 1.00 ms per token, 1000.53 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 68987.39 ms / 67720 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_s.ppx
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
====== Perplexity statistics ======
|
2 |
+
Mean PPL(Q) : 61.676169 ± 0.780539
|
3 |
+
Mean PPL(base) : 8.445938 ± 0.065177
|
4 |
+
Cor(ln(PPL(Q)), ln(PPL(base))): 73.64%
|
5 |
+
Mean ln(PPL(Q)/PPL(base)) : 1.988212 ± 0.008711
|
6 |
+
Mean PPL(Q)/PPL(base) : 7.302465 ± 0.063613
|
7 |
+
Mean PPL(Q)-PPL(base) : 53.230232 ± 0.733873
|
8 |
+
|
9 |
+
====== KL divergence statistics ======
|
10 |
+
Mean KLD: 1.888847 ± 0.006380
|
11 |
+
Maximum KLD: 33.008038
|
12 |
+
99.9% KLD: 17.721254
|
13 |
+
99.0% KLD: 12.006232
|
14 |
+
99.0% KLD: 12.006232
|
15 |
+
Median KLD: 1.128817
|
16 |
+
10.0% KLD: 0.013821
|
17 |
+
5.0% KLD: 0.002327
|
18 |
+
1.0% KLD: 0.000107
|
19 |
+
Minimum KLD: -0.000003
|
20 |
+
|
21 |
+
====== Token probability statistics ======
|
22 |
+
Mean Δp: -10.182 ± 0.088 %
|
23 |
+
Maximum Δp: 99.676%
|
24 |
+
99.9% Δp: 91.955%
|
25 |
+
99.0% Δp: 72.330%
|
26 |
+
95.0% Δp: 39.211%
|
27 |
+
90.0% Δp: 18.768%
|
28 |
+
75.0% Δp: 0.478%
|
29 |
+
Median Δp: -0.585%
|
30 |
+
25.0% Δp: -16.265%
|
31 |
+
10.0% Δp: -63.264%
|
32 |
+
5.0% Δp: -91.364%
|
33 |
+
1.0% Δp: -99.971%
|
34 |
+
0.1% Δp: -99.999%
|
35 |
+
Minimum Δp: -100.000%
|
36 |
+
RMS Δp : 35.283 ± 0.093 %
|
37 |
+
Same top p: 57.426 ± 0.128 %
|
scores/Qwen3-30B-A3B-pruned-q3_k_s.tqa
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 30.9333 +/- 1.6889
|
6 |
+
Random chance: 19.8992 +/- 1.4588
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 988.65 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 52374.28 ms / 49696 tokens ( 1.05 ms per token, 948.86 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 54015.45 ms / 49697 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q3_k_s.wng
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final Winogrande score(750 tasks): 63.0667 +/- 1.7635
|
6 |
+
|
7 |
+
llama_perf_context_print: load time = 929.52 ms
|
8 |
+
llama_perf_context_print: prompt eval time = 21714.40 ms / 21448 tokens ( 1.01 ms per token, 987.73 tokens per second)
|
9 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
+
llama_perf_context_print: total time = 22212.48 ms / 21449 tokens
|
11 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q4_k_m.arc
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 60.5333 +/- 1.7860
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 7096.93 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 38331.71 ms / 35972 tokens ( 1.07 ms per token, 938.44 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 39294.96 ms / 35973 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q4_k_m.hsw
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
750 71.46666667% [68.1319%, 74.5827%]
|
6 |
+
|
7 |
+
|
8 |
+
llama_perf_context_print: load time = 1212.85 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 130689.40 ms / 126038 tokens ( 1.04 ms per token, 964.41 tokens per second)
|
10 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 134566.53 ms / 126039 tokens
|
12 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q4_k_m.mmlu
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 41.8667 +/- 1.8026
|
6 |
+
Random chance: 25.0000 +/- 1.5822
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 1263.34 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 69966.59 ms / 67719 tokens ( 1.03 ms per token, 967.88 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 71361.75 ms / 67720 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q4_k_m.ppx
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
====== Perplexity statistics ======
|
2 |
+
Mean PPL(Q) : 58.664820 ± 0.740540
|
3 |
+
Mean PPL(base) : 8.445938 ± 0.065177
|
4 |
+
Cor(ln(PPL(Q)), ln(PPL(base))): 74.32%
|
5 |
+
Mean ln(PPL(Q)/PPL(base)) : 1.938155 ± 0.008608
|
6 |
+
Mean PPL(Q)/PPL(base) : 6.945921 ± 0.059790
|
7 |
+
Mean PPL(Q)-PPL(base) : 50.218882 ± 0.693471
|
8 |
+
|
9 |
+
====== KL divergence statistics ======
|
10 |
+
Mean KLD: 1.826410 ± 0.006359
|
11 |
+
Maximum KLD: 38.350769
|
12 |
+
99.9% KLD: 17.546576
|
13 |
+
99.0% KLD: 12.219206
|
14 |
+
99.0% KLD: 12.219206
|
15 |
+
Median KLD: 1.056963
|
16 |
+
10.0% KLD: 0.011905
|
17 |
+
5.0% KLD: 0.002018
|
18 |
+
1.0% KLD: 0.000098
|
19 |
+
Minimum KLD: -0.000003
|
20 |
+
|
21 |
+
====== Token probability statistics ======
|
22 |
+
Mean Δp: -9.585 ± 0.087 %
|
23 |
+
Maximum Δp: 99.633%
|
24 |
+
99.9% Δp: 91.281%
|
25 |
+
99.0% Δp: 72.420%
|
26 |
+
95.0% Δp: 39.757%
|
27 |
+
90.0% Δp: 19.570%
|
28 |
+
75.0% Δp: 0.574%
|
29 |
+
Median Δp: -0.481%
|
30 |
+
25.0% Δp: -15.306%
|
31 |
+
10.0% Δp: -60.898%
|
32 |
+
5.0% Δp: -90.180%
|
33 |
+
1.0% Δp: -99.976%
|
34 |
+
0.1% Δp: -100.000%
|
35 |
+
Minimum Δp: -100.000%
|
36 |
+
RMS Δp : 34.806 ± 0.092 %
|
37 |
+
Same top p: 58.557 ± 0.128 %
|
scores/Qwen3-30B-A3B-pruned-q4_k_m.tqa
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 30.9333 +/- 1.6889
|
6 |
+
Random chance: 19.8992 +/- 1.4588
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 1267.40 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 53738.21 ms / 49696 tokens ( 1.08 ms per token, 924.78 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 55506.24 ms / 49697 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q4_k_m.wng
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_M.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final Winogrande score(750 tasks): 66.1333 +/- 1.7292
|
6 |
+
|
7 |
+
llama_perf_context_print: load time = 1232.55 ms
|
8 |
+
llama_perf_context_print: prompt eval time = 21981.09 ms / 21448 tokens ( 1.02 ms per token, 975.75 tokens per second)
|
9 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
+
llama_perf_context_print: total time = 22534.28 ms / 21449 tokens
|
11 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q4_k_s.arc
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_S.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 60.8000 +/- 1.7838
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 6986.96 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 38183.57 ms / 35972 tokens ( 1.06 ms per token, 942.08 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 39137.36 ms / 35973 tokens
|
13 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q4_k_s.hsw
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_S.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
750 71.06666667% [67.7206%, 74.1981%]
|
6 |
+
|
7 |
+
|
8 |
+
llama_perf_context_print: load time = 1188.49 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 130416.13 ms / 126038 tokens ( 1.03 ms per token, 966.43 tokens per second)
|
10 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 134331.92 ms / 126039 tokens
|
12 |
+
ggml_metal_free: deallocating
|
scores/Qwen3-30B-A3B-pruned-q4_k_s.mmlu
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
build: 5580 (bfb1e012) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
+
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 49151 MiB free
|
3 |
+
llama_model_loader: loaded meta data with 42 key-value pairs and 555 tensors from ./Qwen3-30B-A3B-Q4_K_S.gguf (version GGUF V3 (latest))
|
4 |
+
|
5 |
+
Final result: 41.4667 +/- 1.8002
|
6 |
+
Random chance: 25.0000 +/- 1.5822
|
7 |
+
|
8 |
+
|
9 |
+
llama_perf_context_print: load time = 1227.63 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 69732.87 ms / 67719 tokens ( 1.03 ms per token, 971.12 tokens per second)
|
11 |
+
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 71119.80 ms / 67720 tokens
|
13 |
+
ggml_metal_free: deallocating
|