eaddario commited on 21 days ago

Commit

10312a8

verified ·

1 Parent(s): 087424a

Generate Perplexity, KLD, ARC, HellaSwag, MMLU, Truthful QA and WinoGrande scores

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

scores/Watt-Tool-8B-F16.arc +6 -6
scores/Watt-Tool-8B-F16.hsw +5 -5
scores/Watt-Tool-8B-F16.mmlu +5 -5
scores/Watt-Tool-8B-F16.tqa +6 -6
scores/Watt-Tool-8B-F16.wng +5 -5
scores/Watt-Tool-8B-Q4_K_M-naive.arc +0 -13
scores/Watt-Tool-8B-Q4_K_M-naive.hsw +0 -12
scores/Watt-Tool-8B-Q4_K_M-naive.mmlu +0 -13
scores/Watt-Tool-8B-Q4_K_M-naive.ppx +0 -37
scores/Watt-Tool-8B-Q4_K_M-naive.tqa +0 -13
scores/Watt-Tool-8B-Q4_K_M-naive.wng +0 -11
scores/Watt-Tool-8B-iq3_m.arc +6 -6
scores/Watt-Tool-8B-iq3_m.hsw +5 -5
scores/Watt-Tool-8B-iq3_m.mmlu +5 -5
scores/Watt-Tool-8B-iq3_m.ppx +30 -30
scores/Watt-Tool-8B-iq3_m.tqa +6 -6
scores/Watt-Tool-8B-iq3_m.wng +5 -5
scores/Watt-Tool-8B-iq3_s.arc +6 -6
scores/Watt-Tool-8B-iq3_s.hsw +5 -5
scores/Watt-Tool-8B-iq3_s.mmlu +5 -5
scores/Watt-Tool-8B-iq3_s.ppx +31 -31
scores/Watt-Tool-8B-iq3_s.tqa +6 -6
scores/Watt-Tool-8B-iq3_s.wng +5 -5
scores/Watt-Tool-8B-iq4_nl.arc +6 -6
scores/Watt-Tool-8B-iq4_nl.hsw +5 -5
scores/Watt-Tool-8B-iq4_nl.mmlu +5 -5
scores/Watt-Tool-8B-iq4_nl.ppx +31 -31
scores/Watt-Tool-8B-iq4_nl.tqa +6 -6
scores/Watt-Tool-8B-iq4_nl.wng +5 -5
scores/Watt-Tool-8B-q3_k_l.arc +6 -6
scores/Watt-Tool-8B-q3_k_l.hsw +5 -5
scores/Watt-Tool-8B-q3_k_l.mmlu +5 -5
scores/Watt-Tool-8B-q3_k_l.ppx +31 -31
scores/Watt-Tool-8B-q3_k_l.tqa +6 -6
scores/Watt-Tool-8B-q3_k_l.wng +5 -5
scores/Watt-Tool-8B-q3_k_m.arc +6 -6
scores/Watt-Tool-8B-q3_k_m.hsw +5 -5
scores/Watt-Tool-8B-q3_k_m.mmlu +5 -5
scores/Watt-Tool-8B-q3_k_m.ppx +31 -31
scores/Watt-Tool-8B-q3_k_m.tqa +6 -6
scores/Watt-Tool-8B-q3_k_m.wng +5 -5
scores/Watt-Tool-8B-q3_k_s.arc +6 -6
scores/Watt-Tool-8B-q3_k_s.hsw +5 -5
scores/Watt-Tool-8B-q3_k_s.mmlu +5 -5
scores/Watt-Tool-8B-q3_k_s.ppx +31 -31
scores/Watt-Tool-8B-q3_k_s.tqa +6 -6
scores/Watt-Tool-8B-q3_k_s.wng +5 -5
scores/Watt-Tool-8B-q4_k_m.arc +6 -6
scores/Watt-Tool-8B-q4_k_m.hsw +5 -5
scores/Watt-Tool-8B-q4_k_m.mmlu +5 -5

scores/Watt-Tool-8B-F16.arc CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
-Final result: 65.2870 +/- 1.7406
-Random chance: 25.0334 +/- 1.5840
-llama_perf_context_print:        load time =    7009.00 ms
-llama_perf_context_print: prompt eval time =  157202.97 ms / 36703 tokens (    4.28 ms per token,   233.48 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  159569.20 ms / 36704 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
+Final result: 65.8667 +/- 1.7325
+Random chance: 25.0083 +/- 1.5824
+llama_perf_context_print:        load time =    7049.26 ms
+llama_perf_context_print: prompt eval time =  109446.86 ms / 36600 tokens (    2.99 ms per token,   334.41 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  110483.23 ms / 36601 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-F16.hsw CHANGED Viewed

@@ -1,12 +1,12 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
-750	80.93333333
-llama_perf_context_print:        load time =     622.45 ms
-llama_perf_context_print: prompt eval time =  412508.17 ms / 125702 tokens (    3.28 ms per token,   304.73 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  418728.78 ms / 125703 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
+750	78.66666667%	[75.5926%, 81.4486%]
+llama_perf_context_print:        load time =     580.08 ms
+llama_perf_context_print: prompt eval time =  381945.70 ms / 126448 tokens (    3.02 ms per token,   331.06 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  386591.06 ms / 126449 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-F16.mmlu CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
-Final result: 42.1333 +/- 1.8042
 Random chance: 25.0000 +/- 1.5822
-llama_perf_context_print:        load time =     625.31 ms
-llama_perf_context_print: prompt eval time =  245071.99 ms / 69227 tokens (    3.54 ms per token,   282.48 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  247990.59 ms / 69228 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
+Final result: 40.9333 +/- 1.7967
 Random chance: 25.0000 +/- 1.5822
+llama_perf_context_print:        load time =     596.70 ms
+llama_perf_context_print: prompt eval time =  197375.99 ms / 67195 tokens (    2.94 ms per token,   340.44 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  198932.74 ms / 67196 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-F16.tqa CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
-Final result: 36.1963 +/- 2.6657
-Random chance: 28.6467 +/- 2.5079
-llama_perf_context_print:        load time =     621.37 ms
-llama_perf_context_print: prompt eval time =   74638.90 ms / 17686 tokens (    4.22 ms per token,   236.95 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   75960.59 ms / 17687 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
+Final result: 32.9333 +/- 1.7172
+Random chance: 19.8992 +/- 1.4588
+llama_perf_context_print:        load time =     624.82 ms
+llama_perf_context_print: prompt eval time =  153527.41 ms / 50072 tokens (    3.07 ms per token,   326.14 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  155568.93 ms / 50073 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-F16.wng CHANGED Viewed

@@ -1,11 +1,11 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
-Final Winogrande score(750 tasks): 74.0000 +/- 1.6027
-llama_perf_context_print:        load time =     621.33 ms
-llama_perf_context_print: prompt eval time =   86368.81 ms / 22255 tokens (    3.88 ms per token,   257.67 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   87630.93 ms / 22256 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
+Final Winogrande score(750 tasks): 74.8000 +/- 1.5864
+llama_perf_context_print:        load time =     624.64 ms
+llama_perf_context_print: prompt eval time =   66689.29 ms / 22192 tokens (    3.01 ms per token,   332.77 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =   67279.36 ms / 22193 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-Q4_K_M-naive.arc DELETED Viewed

@@ -1,13 +0,0 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
-llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
-llama_model_loader: loaded meta data with 42 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M-naive.gguf (version GGUF V3 (latest))
-Final result: 62.5668 +/- 1.7707
-Random chance: 25.0251 +/- 1.5848
-llama_perf_context_print:        load time =     707.57 ms
-llama_perf_context_print: prompt eval time =  164606.88 ms / 36539 tokens (    4.50 ms per token,   221.98 tokens per second)
-llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  166874.76 ms / 36540 tokens
-ggml_metal_free: deallocating

scores/Watt-Tool-8B-Q4_K_M-naive.hsw DELETED Viewed

@@ -1,12 +0,0 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
-llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
-llama_model_loader: loaded meta data with 42 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M-naive.gguf (version GGUF V3 (latest))
-750	77.73333333
-llama_perf_context_print:        load time =     306.76 ms
-llama_perf_context_print: prompt eval time =  436291.37 ms / 122836 tokens (    3.55 ms per token,   281.55 tokens per second)
-llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  441964.91 ms / 122837 tokens
-ggml_metal_free: deallocating

scores/Watt-Tool-8B-Q4_K_M-naive.mmlu DELETED Viewed

@@ -1,13 +0,0 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
-llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
-llama_model_loader: loaded meta data with 42 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M-naive.gguf (version GGUF V3 (latest))
-Final result: 42.0000 +/- 1.8034
-Random chance: 25.0000 +/- 1.5822
-llama_perf_context_print:        load time =     304.34 ms
-llama_perf_context_print: prompt eval time =  262641.92 ms / 69673 tokens (    3.77 ms per token,   265.28 tokens per second)
-llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  265464.52 ms / 69674 tokens
-ggml_metal_free: deallocating

scores/Watt-Tool-8B-Q4_K_M-naive.ppx DELETED Viewed

@@ -1,37 +0,0 @@
-====== Perplexity statistics ======
-Mean PPL(Q)                   :   7.409510 ±   0.046740
-Mean PPL(base)                :   7.237090 ±   0.045539
-Cor(ln(PPL(Q)), ln(PPL(base))):  99.65%
-Mean ln(PPL(Q)/PPL(base))     :   0.023545 ±   0.000530
-Mean PPL(Q)/PPL(base)         :   1.023825 ±   0.000543
-Mean PPL(Q)-PPL(base)         :   0.172420 ±   0.004061
-====== KL divergence statistics ======
-Mean    KLD:   0.017663 ±   0.000107
-Maximum KLD:   5.749704
-99.9%   KLD:   0.447724
-99.0%   KLD:   0.139140
-99.0%   KLD:   0.139140
-Median  KLD:   0.010320
-10.0%   KLD:   0.000617
- 5.0%   KLD:   0.000201
- 1.0%   KLD:   0.000027
-Minimum KLD:  -0.000129
-====== Token probability statistics ======
-Mean    Δp: -0.531 ± 0.010 %
-Maximum Δp: 55.716%
-99.9%   Δp: 17.458%
-99.0%   Δp:  8.256%
-95.0%   Δp:  3.790%
-90.0%   Δp:  2.138%
-75.0%   Δp:  0.367%
-Median  Δp: -0.034%
-25.0%   Δp: -1.129%
-10.0%   Δp: -3.654%
- 5.0%   Δp: -5.855%
- 1.0%   Δp: -12.744%
- 0.1%   Δp: -31.910%
-Minimum Δp: -99.362%
-RMS Δp    :  3.658 ± 0.032 %
-Same top p: 93.743 ± 0.064 %

scores/Watt-Tool-8B-Q4_K_M-naive.tqa DELETED Viewed

@@ -1,13 +0,0 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
-llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
-llama_model_loader: loaded meta data with 42 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M-naive.gguf (version GGUF V3 (latest))
-Final result: 36.8098 +/- 2.6753
-Random chance: 28.5214 +/- 2.5046
-llama_perf_context_print:        load time =     306.51 ms
-llama_perf_context_print: prompt eval time =   78347.98 ms / 17655 tokens (    4.44 ms per token,   225.34 tokens per second)
-llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   79593.31 ms / 17656 tokens
-ggml_metal_free: deallocating

scores/Watt-Tool-8B-Q4_K_M-naive.wng DELETED Viewed

@@ -1,11 +0,0 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
-llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
-llama_model_loader: loaded meta data with 42 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M-naive.gguf (version GGUF V3 (latest))
-Final Winogrande score(750 tasks): 73.6000 +/- 1.6106
-llama_perf_context_print:        load time =     295.82 ms
-llama_perf_context_print: prompt eval time =   90900.17 ms / 22246 tokens (    4.09 ms per token,   244.73 tokens per second)
-llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   92103.74 ms / 22247 tokens
-ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq3_m.arc CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
-Final result: 57.6203 +/- 1.8080
-Random chance: 25.0251 +/- 1.5848
-llama_perf_context_print:        load time =    1615.23 ms
-llama_perf_context_print: prompt eval time =  160568.89 ms / 36381 tokens (    4.41 ms per token,   226.58 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  162816.72 ms / 36382 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
+Final result: 62.8000 +/- 1.7661
+Random chance: 25.0083 +/- 1.5824
+llama_perf_context_print:        load time =    1734.01 ms
+llama_perf_context_print: prompt eval time =  115043.84 ms / 36600 tokens (    3.14 ms per token,   318.14 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  115943.71 ms / 36601 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq3_m.hsw CHANGED Viewed

@@ -1,12 +1,12 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
-750	78.80000000
-llama_perf_context_print:        load time =     279.18 ms
-llama_perf_context_print: prompt eval time =  433163.50 ms / 124534 tokens (    3.48 ms per token,   287.50 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  438833.07 ms / 124535 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
+750	78.00000000%	[74.8968%, 80.8179%]
+llama_perf_context_print:        load time =     291.01 ms
+llama_perf_context_print: prompt eval time =  400031.71 ms / 126448 tokens (    3.16 ms per token,   316.09 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  404305.75 ms / 126449 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq3_m.mmlu CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
-Final result: 36.2667 +/- 1.7567
 Random chance: 25.0000 +/- 1.5822
-llama_perf_context_print:        load time =     280.68 ms
-llama_perf_context_print: prompt eval time =  260791.25 ms / 70687 tokens (    3.69 ms per token,   271.05 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  263549.69 ms / 70688 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
+Final result: 37.7333 +/- 1.7711
 Random chance: 25.0000 +/- 1.5822
+llama_perf_context_print:        load time =     289.89 ms
+llama_perf_context_print: prompt eval time =  206632.46 ms / 67195 tokens (    3.08 ms per token,   325.19 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  208049.33 ms / 67196 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq3_m.ppx CHANGED Viewed

@@ -1,37 +1,37 @@
 ====== Perplexity statistics ======
-Mean PPL(Q)                   :   8.963688 ±   0.058386
 Mean PPL(base)                :   7.237090 ±   0.045539
-Cor(ln(PPL(Q)), ln(PPL(base))):  95.93%
-Mean ln(PPL(Q)/PPL(base))     :   0.213963 ±   0.001840
-Mean PPL(Q)/PPL(base)         :   1.238576 ±   0.002279
-Mean PPL(Q)-PPL(base)         :   1.726598 ±   0.019534
 ====== KL divergence statistics ======
-Mean    KLD:   0.209768 ±   0.000734
-Maximum KLD:  11.045219
-99.9%   KLD:   3.037534
-99.0%   KLD:   1.327696
-99.0%   KLD:   1.327696
-Median  KLD:   0.145406
-10.0%   KLD:   0.013515
- 5.0%   KLD:   0.004453
- 1.0%   KLD:   0.000589
 Minimum KLD:   0.000000
 ====== Token probability statistics ======
-Mean    Δp: -4.187 ± 0.035 %
-Maximum Δp: 87.898%
-99.9%   Δp: 53.785%
-99.0%   Δp: 29.326%
-95.0%   Δp: 12.210%
-90.0%   Δp:  5.784%
-75.0%   Δp:  0.302%
-Median  Δp: -0.896%
-25.0%   Δp: -7.479%
-10.0%   Δp: -19.150%
- 5.0%   Δp: -28.492%
- 1.0%   Δp: -52.697%
- 0.1%   Δp: -83.945%
-Minimum Δp: -97.765%
-RMS Δp    : 13.969 ± 0.056 %
-Same top p: 77.664 ± 0.110 %

 ====== Perplexity statistics ======
+Mean PPL(Q)                   :   7.841948 ±   0.049502
 Mean PPL(base)                :   7.237090 ±   0.045539
+Cor(ln(PPL(Q)), ln(PPL(base))):  98.36%
+Mean ln(PPL(Q)/PPL(base))     :   0.080268 ±   0.001143
+Mean PPL(Q)/PPL(base)         :   1.083578 ±   0.001238
+Mean PPL(Q)-PPL(base)         :   0.604858 ±   0.009476
 ====== KL divergence statistics ======
+Mean    KLD:   0.081774 ±   0.000354
+Maximum KLD:   7.690053
+99.9%   KLD:   1.654508
+99.0%   KLD:   0.555790
+99.0%   KLD:   0.555790
+Median  KLD:   0.056256
+10.0%   KLD:   0.003426
+ 5.0%   KLD:   0.001063
+ 1.0%   KLD:   0.000157
 Minimum KLD:   0.000000
 ====== Token probability statistics ======
+Mean    Δp: -2.133 ± 0.021 %
+Maximum Δp: 73.495%
+99.9%   Δp: 32.336%
+99.0%   Δp: 17.093%
+95.0%   Δp:  7.846%
+90.0%   Δp:  4.045%
+75.0%   Δp:  0.372%
+Median  Δp: -0.301%
+25.0%   Δp: -3.967%
+10.0%   Δp: -10.805%
+ 5.0%   Δp: -16.007%
+ 1.0%   Δp: -30.015%
+ 0.1%   Δp: -62.256%
+Minimum Δp: -96.763%
+RMS Δp    :  8.316 ± 0.043 %
+Same top p: 85.224 ± 0.094 %

scores/Watt-Tool-8B-iq3_m.tqa CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
-Final result: 33.2308 +/- 2.6169
-Random chance: 28.5589 +/- 2.5094
-llama_perf_context_print:        load time =     280.73 ms
-llama_perf_context_print: prompt eval time =   76784.24 ms / 17625 tokens (    4.36 ms per token,   229.54 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   78027.49 ms / 17626 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
+Final result: 32.1333 +/- 1.7063
+Random chance: 19.8992 +/- 1.4588
+llama_perf_context_print:        load time =     288.09 ms
+llama_perf_context_print: prompt eval time =  161368.09 ms / 50072 tokens (    3.22 ms per token,   310.30 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  163199.21 ms / 50073 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq3_m.wng CHANGED Viewed

@@ -1,11 +1,11 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
-Final Winogrande score(750 tasks): 70.9333 +/- 1.6591
-llama_perf_context_print:        load time =     284.52 ms
-llama_perf_context_print: prompt eval time =   89372.55 ms / 22269 tokens (    4.01 ms per token,   249.17 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   90553.60 ms / 22270 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
+Final Winogrande score(750 tasks): 73.6000 +/- 1.6106
+llama_perf_context_print:        load time =     288.50 ms
+llama_perf_context_print: prompt eval time =   70143.95 ms / 22192 tokens (    3.16 ms per token,   316.38 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =   70631.06 ms / 22193 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq3_s.arc CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
-Final result: 57.3529 +/- 1.8095
-Random chance: 25.0335 +/- 1.5850
-llama_perf_context_print:        load time =    1575.89 ms
-llama_perf_context_print: prompt eval time =  159215.12 ms / 36653 tokens (    4.34 ms per token,   230.21 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  161465.24 ms / 36654 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
+Final result: 62.0000 +/- 1.7736
+Random chance: 25.0083 +/- 1.5824
+llama_perf_context_print:        load time =    1662.56 ms
+llama_perf_context_print: prompt eval time =  115280.42 ms / 36600 tokens (    3.15 ms per token,   317.49 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  116185.54 ms / 36601 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq3_s.hsw CHANGED Viewed

@@ -1,12 +1,12 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
-750	77.20000000
-llama_perf_context_print:        load time =     298.56 ms
-llama_perf_context_print: prompt eval time =  430279.23 ms / 124462 tokens (    3.46 ms per token,   289.26 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  436034.64 ms / 124463 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
+750	76.26666667%	[73.0928%, 79.1728%]
+llama_perf_context_print:        load time =     286.42 ms
+llama_perf_context_print: prompt eval time =  400735.90 ms / 126448 tokens (    3.17 ms per token,   315.54 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  405023.75 ms / 126449 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq3_s.mmlu CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
-Final result: 36.1333 +/- 1.7553
 Random chance: 25.0000 +/- 1.5822
-llama_perf_context_print:        load time =     279.05 ms
-llama_perf_context_print: prompt eval time =  257292.94 ms / 70079 tokens (    3.67 ms per token,   272.37 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  260084.41 ms / 70080 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
+Final result: 37.3333 +/- 1.7674
 Random chance: 25.0000 +/- 1.5822
+llama_perf_context_print:        load time =     293.04 ms
+llama_perf_context_print: prompt eval time =  207055.09 ms / 67195 tokens (    3.08 ms per token,   324.53 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  208462.26 ms / 67196 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq3_s.ppx CHANGED Viewed

@@ -1,37 +1,37 @@
 ====== Perplexity statistics ======
-Mean PPL(Q)                   :   9.032577 ±   0.058532
 Mean PPL(base)                :   7.237090 ±   0.045539
-Cor(ln(PPL(Q)), ln(PPL(base))):  95.96%
-Mean ln(PPL(Q)/PPL(base))     :   0.221619 ±   0.001825
-Mean PPL(Q)/PPL(base)         :   1.248095 ±   0.002278
-Mean PPL(Q)-PPL(base)         :   1.795488 ±   0.019603
 ====== KL divergence statistics ======
-Mean    KLD:   0.204862 ±   0.000758
-Maximum KLD:   9.152626
-99.9%   KLD:   3.342306
-99.0%   KLD:   1.358416
-99.0%   KLD:   1.358416
-Median  KLD:   0.143007
-10.0%   KLD:   0.013411
- 5.0%   KLD:   0.004899
- 1.0%   KLD:   0.000896
-Minimum KLD:   0.000000
 ====== Token probability statistics ======
-Mean    Δp: -4.675 ± 0.034 %
-Maximum Δp: 83.272%
-99.9%   Δp: 42.833%
-99.0%   Δp: 23.669%
-95.0%   Δp: 10.247%
-90.0%   Δp:  4.840%
-75.0%   Δp:  0.184%
-Median  Δp: -1.059%
-25.0%   Δp: -7.819%
-10.0%   Δp: -19.392%
- 5.0%   Δp: -28.470%
- 1.0%   Δp: -53.110%
- 0.1%   Δp: -86.125%
-Minimum Δp: -99.905%
-RMS Δp    : 13.678 ± 0.058 %
-Same top p: 78.198 ± 0.109 %

 ====== Perplexity statistics ======
+Mean PPL(Q)                   :   8.253598 ±   0.051864
 Mean PPL(base)                :   7.237090 ±   0.045539
+Cor(ln(PPL(Q)), ln(PPL(base))):  97.71%
+Mean ln(PPL(Q)/PPL(base))     :   0.131430 ±   0.001346
+Mean PPL(Q)/PPL(base)         :   1.140458 ±   0.001535
+Mean PPL(Q)-PPL(base)         :   1.016508 ±   0.012175
 ====== KL divergence statistics ======
+Mean    KLD:   0.117565 ±   0.000433
+Maximum KLD:   7.079286
+99.9%   KLD:   1.966468
+99.0%   KLD:   0.726076
+99.0%   KLD:   0.726076
+Median  KLD:   0.084699
+10.0%   KLD:   0.006988
+ 5.0%   KLD:   0.002383
+ 1.0%   KLD:   0.000330
+Minimum KLD:  -0.000001
 ====== Token probability statistics ======
+Mean    Δp: -3.685 ± 0.026 %
+Maximum Δp: 69.513%
+99.9%   Δp: 34.570%
+99.0%   Δp: 17.585%
+95.0%   Δp:  7.273%
+90.0%   Δp:  3.369%
+75.0%   Δp:  0.113%
+Median  Δp: -0.833%
+25.0%   Δp: -6.212%
+10.0%   Δp: -15.079%
+ 5.0%   Δp: -21.666%
+ 1.0%   Δp: -38.754%
+ 0.1%   Δp: -69.188%
+Minimum Δp: -97.122%
+RMS Δp    : 10.385 ± 0.045 %
+Same top p: 82.770 ± 0.100 %

scores/Watt-Tool-8B-iq3_s.tqa CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
-Final result: 35.5346 +/- 2.6882
-Random chance: 28.4691 +/- 2.5346
-llama_perf_context_print:        load time =     277.68 ms
-llama_perf_context_print: prompt eval time =   74951.02 ms / 17379 tokens (    4.31 ms per token,   231.87 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   76169.50 ms / 17380 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
+Final result: 30.4000 +/- 1.6807
+Random chance: 19.8992 +/- 1.4588
+llama_perf_context_print:        load time =     284.74 ms
+llama_perf_context_print: prompt eval time =  161670.39 ms / 50072 tokens (    3.23 ms per token,   309.72 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  163511.55 ms / 50073 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq3_s.wng CHANGED Viewed

@@ -1,11 +1,11 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
-Final Winogrande score(750 tasks): 70.2667 +/- 1.6702
-llama_perf_context_print:        load time =     277.86 ms
-llama_perf_context_print: prompt eval time =   88618.25 ms / 22199 tokens (    3.99 ms per token,   250.50 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   89806.41 ms / 22200 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
+Final Winogrande score(750 tasks): 72.9333 +/- 1.6235
+llama_perf_context_print:        load time =     291.13 ms
+llama_perf_context_print: prompt eval time =   70279.68 ms / 22192 tokens (    3.17 ms per token,   315.77 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =   70763.82 ms / 22193 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq4_nl.arc CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
-Final result: 64.7925 +/- 1.7487
-Random chance: 25.0251 +/- 1.5859
-llama_perf_context_print:        load time =    1990.26 ms
-llama_perf_context_print: prompt eval time =  156702.59 ms / 36807 tokens (    4.26 ms per token,   234.88 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  158963.12 ms / 36808 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
+Final result: 63.4667 +/- 1.7594
+Random chance: 25.0083 +/- 1.5824
+llama_perf_context_print:        load time =    2048.98 ms
+llama_perf_context_print: prompt eval time =  119074.35 ms / 36600 tokens (    3.25 ms per token,   307.37 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  119972.06 ms / 36601 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq4_nl.hsw CHANGED Viewed

@@ -1,12 +1,12 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
-750	77.86666667
-llama_perf_context_print:        load time =     285.97 ms
-llama_perf_context_print: prompt eval time =  426128.64 ms / 126096 tokens (    3.38 ms per token,   295.91 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  431845.15 ms / 126097 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
+750	77.73333333%	[74.6188%, 80.5653%]
+llama_perf_context_print:        load time =     297.65 ms
+llama_perf_context_print: prompt eval time =  413170.51 ms / 126448 tokens (    3.27 ms per token,   306.04 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  417435.80 ms / 126449 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq4_nl.mmlu CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
-Final result: 39.7333 +/- 1.7880
 Random chance: 25.0000 +/- 1.5822
-llama_perf_context_print:        load time =     283.49 ms
-llama_perf_context_print: prompt eval time =  260134.32 ms / 72070 tokens (    3.61 ms per token,   277.05 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  262954.84 ms / 72071 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
+Final result: 39.6000 +/- 1.7870
 Random chance: 25.0000 +/- 1.5822
+llama_perf_context_print:        load time =     304.48 ms
+llama_perf_context_print: prompt eval time =  213794.40 ms / 67195 tokens (    3.18 ms per token,   314.30 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  215197.82 ms / 67196 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq4_nl.ppx CHANGED Viewed

@@ -1,37 +1,37 @@
 ====== Perplexity statistics ======
-Mean PPL(Q)                   :   7.935917 ±   0.053744
 Mean PPL(base)                :   7.237090 ±   0.045539
-Cor(ln(PPL(Q)), ln(PPL(base))):  98.36%
-Mean ln(PPL(Q)/PPL(base))     :   0.092180 ±   0.001277
-Mean PPL(Q)/PPL(base)         :   1.096562 ±   0.001400
-Mean PPL(Q)-PPL(base)         :   0.698827 ±   0.012156
 ====== KL divergence statistics ======
-Mean    KLD:   0.096510 ±   0.000325
-Maximum KLD:   5.312946
-99.9%   KLD:   1.205405
-99.0%   KLD:   0.590747
-99.0%   KLD:   0.590747
-Median  KLD:   0.069663
-10.0%   KLD:   0.003776
- 5.0%   KLD:   0.001002
- 1.0%   KLD:   0.000098
-Minimum KLD:  -0.000140
 ====== Token probability statistics ======
-Mean    Δp:  0.914 ± 0.023 %
-Maximum Δp: 68.385%
-99.9%   Δp: 48.877%
-99.0%   Δp: 30.041%
-95.0%   Δp: 15.934%
-90.0%   Δp: 10.078%
-75.0%   Δp:  2.824%
-Median  Δp:  0.026%
-25.0%   Δp: -1.244%
-10.0%   Δp: -6.629%
- 5.0%   Δp: -11.886%
- 1.0%   Δp: -25.558%
- 0.1%   Δp: -49.037%
-Minimum Δp: -91.804%
-RMS Δp    :  8.908 ± 0.037 %
-Same top p: 85.211 ± 0.094 %

 ====== Perplexity statistics ======
+Mean PPL(Q)                   :   7.516430 ±   0.047275
 Mean PPL(base)                :   7.237090 ±   0.045539
+Cor(ln(PPL(Q)), ln(PPL(base))):  99.30%
+Mean ln(PPL(Q)/PPL(base))     :   0.037872 ±   0.000742
+Mean PPL(Q)/PPL(base)         :   1.038599 ±   0.000771
+Mean PPL(Q)-PPL(base)         :   0.279341 ±   0.005741
 ====== KL divergence statistics ======
+Mean    KLD:   0.034545 ±   0.000172
+Maximum KLD:   4.479205
+99.9%   KLD:   0.809954
+99.0%   KLD:   0.243338
+99.0%   KLD:   0.243338
+Median  KLD:   0.022288
+10.0%   KLD:   0.001467
+ 5.0%   KLD:   0.000485
+ 1.0%   KLD:   0.000067
+Minimum KLD:  -0.000025
 ====== Token probability statistics ======
+Mean    Δp: -0.984 ± 0.014 %
+Maximum Δp: 60.862%
+99.9%   Δp: 23.910%
+99.0%   Δp: 11.795%
+95.0%   Δp:  5.549%
+90.0%   Δp:  3.027%
+75.0%   Δp:  0.404%
+Median  Δp: -0.103%
+25.0%   Δp: -2.015%
+10.0%   Δp: -6.046%
+ 5.0%   Δp: -9.267%
+ 1.0%   Δp: -18.431%
+ 0.1%   Δp: -42.825%
+Minimum Δp: -93.056%
+RMS Δp    :  5.270 ± 0.035 %
+Same top p: 90.812 ± 0.076 %

scores/Watt-Tool-8B-iq4_nl.tqa CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
-Final result: 33.9564 +/- 2.6473
-Random chance: 28.5587 +/- 2.5250
-llama_perf_context_print:        load time =     300.67 ms
-llama_perf_context_print: prompt eval time =   74415.31 ms / 17418 tokens (    4.27 ms per token,   234.06 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   75640.51 ms / 17419 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
+Final result: 31.4667 +/- 1.6968
+Random chance: 19.8992 +/- 1.4588
+llama_perf_context_print:        load time =     311.22 ms
+llama_perf_context_print: prompt eval time =  166835.67 ms / 50072 tokens (    3.33 ms per token,   300.13 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  168685.42 ms / 50073 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-iq4_nl.wng CHANGED Viewed

@@ -1,11 +1,11 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
-Final Winogrande score(750 tasks): 71.8667 +/- 1.6430
-llama_perf_context_print:        load time =     302.27 ms
-llama_perf_context_print: prompt eval time =   88132.28 ms / 22378 tokens (    3.94 ms per token,   253.91 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   89331.17 ms / 22379 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
+Final Winogrande score(750 tasks): 75.4667 +/- 1.5722
+llama_perf_context_print:        load time =     302.51 ms
+llama_perf_context_print: prompt eval time =   72558.29 ms / 22192 tokens (    3.27 ms per token,   305.85 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =   73039.49 ms / 22193 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_l.arc CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
-Final result: 62.6506 +/- 1.7711
-Random chance: 25.0251 +/- 1.5859
-llama_perf_context_print:        load time =    1624.57 ms
-llama_perf_context_print: prompt eval time =  171694.83 ms / 36304 tokens (    4.73 ms per token,   211.44 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  173922.19 ms / 36305 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
+Final result: 61.7333 +/- 1.7759
+Random chance: 25.0083 +/- 1.5824
+llama_perf_context_print:        load time =    1803.05 ms
+llama_perf_context_print: prompt eval time =  123314.10 ms / 36600 tokens (    3.37 ms per token,   296.80 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  124217.22 ms / 36601 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_l.hsw CHANGED Viewed

@@ -1,12 +1,12 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
-750	74.66666667
-llama_perf_context_print:        load time =     308.34 ms
-llama_perf_context_print: prompt eval time =  469602.68 ms / 125000 tokens (    3.76 ms per token,   266.18 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  475338.97 ms / 125001 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
+750	77.20000000%	[74.0633%, 80.0595%]
+llama_perf_context_print:        load time =     293.80 ms
+llama_perf_context_print: prompt eval time =  428872.85 ms / 126448 tokens (    3.39 ms per token,   294.84 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  433161.99 ms / 126449 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_l.mmlu CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
-Final result: 36.0000 +/- 1.7539
 Random chance: 25.0000 +/- 1.5822
-llama_perf_context_print:        load time =     330.46 ms
-llama_perf_context_print: prompt eval time =  271968.64 ms / 67808 tokens (    4.01 ms per token,   249.32 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  274735.21 ms / 67809 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
+Final result: 38.5333 +/- 1.7783
 Random chance: 25.0000 +/- 1.5822
+llama_perf_context_print:        load time =     293.76 ms
+llama_perf_context_print: prompt eval time =  221450.17 ms / 67195 tokens (    3.30 ms per token,   303.43 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  222861.86 ms / 67196 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_l.ppx CHANGED Viewed

@@ -1,37 +1,37 @@
 ====== Perplexity statistics ======
-Mean PPL(Q)                   :   9.923766 ±   0.061734
 Mean PPL(base)                :   7.237090 ±   0.045539
-Cor(ln(PPL(Q)), ln(PPL(base))):  94.33%
-Mean ln(PPL(Q)/PPL(base))     :   0.315713 ±   0.002108
-Mean PPL(Q)/PPL(base)         :   1.371237 ±   0.002890
-Mean PPL(Q)-PPL(base)         :   2.686677 ±   0.024105
 ====== KL divergence statistics ======
-Mean    KLD:   0.292497 ±   0.000978
-Maximum KLD:  11.110364
-99.9%   KLD:   3.423501
-99.0%   KLD:   2.016132
-99.0%   KLD:   2.016132
-Median  KLD:   0.210275
-10.0%   KLD:   0.019525
- 5.0%   KLD:   0.005606
- 1.0%   KLD:   0.000617
-Minimum KLD:   0.000004
 ====== Token probability statistics ======
-Mean    Δp: -8.373 ± 0.044 %
-Maximum Δp: 78.738%
-99.9%   Δp: 39.788%
-99.0%   Δp: 20.797%
-95.0%   Δp:  7.364%
-90.0%   Δp:  2.736%
-75.0%   Δp:  0.005%
-Median  Δp: -2.257%
-25.0%   Δp: -12.828%
-10.0%   Δp: -28.442%
- 5.0%   Δp: -41.830%
- 1.0%   Δp: -75.747%
- 0.1%   Δp: -90.320%
-Minimum Δp: -99.555%
-RMS Δp    : 18.605 ± 0.069 %
-Same top p: 74.871 ± 0.114 %

 ====== Perplexity statistics ======
+Mean PPL(Q)                   :   8.274172 ±   0.052402
 Mean PPL(base)                :   7.237090 ±   0.045539
+Cor(ln(PPL(Q)), ln(PPL(base))):  97.60%
+Mean ln(PPL(Q)/PPL(base))     :   0.133920 ±   0.001382
+Mean PPL(Q)/PPL(base)         :   1.143301 ±   0.001580
+Mean PPL(Q)-PPL(base)         :   1.037082 ±   0.012706
 ====== KL divergence statistics ======
+Mean    KLD:   0.114738 ±   0.000483
+Maximum KLD:   9.999102
+99.9%   KLD:   2.236693
+99.0%   KLD:   0.781076
+99.0%   KLD:   0.781076
+Median  KLD:   0.077728
+10.0%   KLD:   0.005170
+ 5.0%   KLD:   0.001727
+ 1.0%   KLD:   0.000289
+Minimum KLD:  -0.000055
 ====== Token probability statistics ======
+Mean    Δp: -3.288 ± 0.025 %
+Maximum Δp: 65.548%
+99.9%   Δp: 32.662%
+99.0%   Δp: 17.193%
+95.0%   Δp:  7.509%
+90.0%   Δp:  3.610%
+75.0%   Δp:  0.176%
+Median  Δp: -0.636%
+25.0%   Δp: -5.421%
+10.0%   Δp: -13.956%
+ 5.0%   Δp: -20.435%
+ 1.0%   Δp: -38.546%
+ 0.1%   Δp: -71.826%
+Minimum Δp: -98.746%
+RMS Δp    : 10.050 ± 0.048 %
+Same top p: 83.354 ± 0.098 %

scores/Watt-Tool-8B-q3_k_l.tqa CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
-Final result: 35.4037 +/- 2.6692
-Random chance: 28.5968 +/- 2.5221
-llama_perf_context_print:        load time =     307.41 ms
-llama_perf_context_print: prompt eval time =   81969.79 ms / 17455 tokens (    4.70 ms per token,   212.94 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   83215.27 ms / 17456 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
+Final result: 32.4000 +/- 1.7100
+Random chance: 19.8992 +/- 1.4588
+llama_perf_context_print:        load time =     303.59 ms
+llama_perf_context_print: prompt eval time =  173066.86 ms / 50072 tokens (    3.46 ms per token,   289.32 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  174906.65 ms / 50073 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_l.wng CHANGED Viewed

@@ -1,11 +1,11 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
-Final Winogrande score(750 tasks): 72.8000 +/- 1.6260
-llama_perf_context_print:        load time =     308.84 ms
-llama_perf_context_print: prompt eval time =   95382.87 ms / 22219 tokens (    4.29 ms per token,   232.95 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   96563.37 ms / 22220 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
+Final Winogrande score(750 tasks): 71.8667 +/- 1.6430
+llama_perf_context_print:        load time =     282.54 ms
+llama_perf_context_print: prompt eval time =   75214.17 ms / 22192 tokens (    3.39 ms per token,   295.05 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =   75700.49 ms / 22193 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_m.arc CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
-Final result: 63.1016 +/- 1.7655
-Random chance: 25.0251 +/- 1.5848
-llama_perf_context_print:        load time =    1629.74 ms
-llama_perf_context_print: prompt eval time =  172545.34 ms / 36557 tokens (    4.72 ms per token,   211.87 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  174780.50 ms / 36558 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
+Final result: 61.0667 +/- 1.7816
+Random chance: 25.0083 +/- 1.5824
+llama_perf_context_print:        load time =    1677.48 ms
+llama_perf_context_print: prompt eval time =  120638.17 ms / 36600 tokens (    3.30 ms per token,   303.39 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  121534.75 ms / 36601 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_m.hsw CHANGED Viewed

@@ -1,12 +1,12 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
-750	75.33333333
-llama_perf_context_print:        load time =     308.43 ms
-llama_perf_context_print: prompt eval time =  462302.43 ms / 122058 tokens (    3.79 ms per token,   264.02 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  467959.83 ms / 122059 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
+750	77.20000000%	[74.0633%, 80.0595%]
+llama_perf_context_print:        load time =     283.76 ms
+llama_perf_context_print: prompt eval time =  419917.27 ms / 126448 tokens (    3.32 ms per token,   301.13 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  424216.79 ms / 126449 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_m.mmlu CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
-Final result: 38.8000 +/- 1.7805
 Random chance: 25.0000 +/- 1.5822
-llama_perf_context_print:        load time =     324.30 ms
-llama_perf_context_print: prompt eval time =  282248.57 ms / 70434 tokens (    4.01 ms per token,   249.55 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  285034.26 ms / 70435 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
+Final result: 38.5333 +/- 1.7783
 Random chance: 25.0000 +/- 1.5822
+llama_perf_context_print:        load time =     285.00 ms
+llama_perf_context_print: prompt eval time =  216659.18 ms / 67195 tokens (    3.22 ms per token,   310.14 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  218063.84 ms / 67196 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_m.ppx CHANGED Viewed

@@ -1,37 +1,37 @@
 ====== Perplexity statistics ======
-Mean PPL(Q)                   :   9.855009 ±   0.061412
 Mean PPL(base)                :   7.237090 ±   0.045539
-Cor(ln(PPL(Q)), ln(PPL(base))):  94.48%
-Mean ln(PPL(Q)/PPL(base))     :   0.308761 ±   0.002082
-Mean PPL(Q)/PPL(base)         :   1.361736 ±   0.002836
-Mean PPL(Q)-PPL(base)         :   2.617919 ±   0.023684
 ====== KL divergence statistics ======
-Mean    KLD:   0.292188 ±   0.000927
-Maximum KLD:   8.990653
-99.9%   KLD:   3.324761
-99.0%   KLD:   1.855327
-99.0%   KLD:   1.855327
-Median  KLD:   0.215862
-10.0%   KLD:   0.020007
- 5.0%   KLD:   0.005427
- 1.0%   KLD:   0.000564
-Minimum KLD:   0.000000
 ====== Token probability statistics ======
-Mean    Δp: -8.052 ± 0.043 %
-Maximum Δp: 83.209%
-99.9%   Δp: 40.828%
-99.0%   Δp: 21.532%
-95.0%   Δp:  8.004%
-90.0%   Δp:  2.972%
-75.0%   Δp:  0.009%
-Median  Δp: -2.217%
-25.0%   Δp: -12.532%
-10.0%   Δp: -27.827%
- 5.0%   Δp: -40.790%
- 1.0%   Δp: -72.837%
- 0.1%   Δp: -88.971%
-Minimum Δp: -99.209%
-RMS Δp    : 18.125 ± 0.066 %
-Same top p: 74.435 ± 0.115 %

 ====== Perplexity statistics ======
+Mean PPL(Q)                   :   8.459379 ±   0.053550
 Mean PPL(base)                :   7.237090 ±   0.045539
+Cor(ln(PPL(Q)), ln(PPL(base))):  97.26%
+Mean ln(PPL(Q)/PPL(base))     :   0.156057 ±   0.001477
+Mean PPL(Q)/PPL(base)         :   1.168892 ±   0.001727
+Mean PPL(Q)-PPL(base)         :   1.222289 ±   0.014061
 ====== KL divergence statistics ======
+Mean    KLD:   0.131196 ±   0.000539
+Maximum KLD:   7.898368
+99.9%   KLD:   2.475934
+99.0%   KLD:   0.894390
+99.0%   KLD:   0.894390
+Median  KLD:   0.089346
+10.0%   KLD:   0.006670
+ 5.0%   KLD:   0.002250
+ 1.0%   KLD:   0.000392
+Minimum KLD:   0.000001
 ====== Token probability statistics ======
+Mean    Δp: -3.913 ± 0.027 %
+Maximum Δp: 64.023%
+99.9%   Δp: 33.075%
+99.0%   Δp: 17.214%
+95.0%   Δp:  7.245%
+90.0%   Δp:  3.301%
+75.0%   Δp:  0.096%
+Median  Δp: -0.875%
+25.0%   Δp: -6.342%
+10.0%   Δp: -15.578%
+ 5.0%   Δp: -22.649%
+ 1.0%   Δp: -41.943%
+ 0.1%   Δp: -75.852%
+Minimum Δp: -98.926%
+RMS Δp    : 10.892 ± 0.050 %
+Same top p: 82.438 ± 0.100 %

scores/Watt-Tool-8B-q3_k_m.tqa CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
-Final result: 35.3846 +/- 2.6565
-Random chance: 28.4838 +/- 2.5074
-llama_perf_context_print:        load time =     305.46 ms
-llama_perf_context_print: prompt eval time =   83329.07 ms / 17627 tokens (    4.73 ms per token,   211.53 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   84576.13 ms / 17628 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
+Final result: 33.3333 +/- 1.7225
+Random chance: 19.8992 +/- 1.4588
+llama_perf_context_print:        load time =     294.56 ms
+llama_perf_context_print: prompt eval time =  169427.06 ms / 50072 tokens (    3.38 ms per token,   295.54 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  171263.15 ms / 50073 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_m.wng CHANGED Viewed

@@ -1,11 +1,11 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
-Final Winogrande score(750 tasks): 72.6667 +/- 1.6284
-llama_perf_context_print:        load time =     331.54 ms
-llama_perf_context_print: prompt eval time =   94938.20 ms / 22104 tokens (    4.30 ms per token,   232.83 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   96106.12 ms / 22105 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
+Final Winogrande score(750 tasks): 73.0667 +/- 1.6209
+llama_perf_context_print:        load time =     286.93 ms
+llama_perf_context_print: prompt eval time =   73604.03 ms / 22192 tokens (    3.32 ms per token,   301.51 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =   74091.88 ms / 22193 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_s.arc CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
-Final result: 61.7135 +/- 1.7797
-Random chance: 25.0251 +/- 1.5859
-llama_perf_context_print:        load time =    1613.35 ms
-llama_perf_context_print: prompt eval time =  172264.72 ms / 36428 tokens (    4.73 ms per token,   211.47 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  174482.10 ms / 36429 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
+Final result: 58.2667 +/- 1.8018
+Random chance: 25.0083 +/- 1.5824
+llama_perf_context_print:        load time =    1653.99 ms
+llama_perf_context_print: prompt eval time =  123080.79 ms / 36600 tokens (    3.36 ms per token,   297.37 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  123982.14 ms / 36601 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_s.hsw CHANGED Viewed

@@ -1,12 +1,12 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
-750	74.00000000
-llama_perf_context_print:        load time =     346.89 ms
-llama_perf_context_print: prompt eval time =  471049.03 ms / 125576 tokens (    3.75 ms per token,   266.59 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  476803.59 ms / 125577 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
+750	75.60000000%	[72.4008%, 78.5383%]
+llama_perf_context_print:        load time =     285.75 ms
+llama_perf_context_print: prompt eval time =  427986.37 ms / 126448 tokens (    3.38 ms per token,   295.45 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  432278.94 ms / 126449 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_s.mmlu CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
-Final result: 37.2000 +/- 1.7661
 Random chance: 25.0000 +/- 1.5822
-llama_perf_context_print:        load time =     312.54 ms
-llama_perf_context_print: prompt eval time =  279002.34 ms / 69611 tokens (    4.01 ms per token,   249.50 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  281794.77 ms / 69612 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
+Final result: 38.1333 +/- 1.7748
 Random chance: 25.0000 +/- 1.5822
+llama_perf_context_print:        load time =     283.79 ms
+llama_perf_context_print: prompt eval time =  221006.99 ms / 67195 tokens (    3.29 ms per token,   304.04 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  222418.23 ms / 67196 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_s.ppx CHANGED Viewed

@@ -1,37 +1,37 @@
 ====== Perplexity statistics ======
-Mean PPL(Q)                   :   9.798719 ±   0.061509
 Mean PPL(base)                :   7.237090 ±   0.045539
-Cor(ln(PPL(Q)), ln(PPL(base))):  94.51%
-Mean ln(PPL(Q)/PPL(base))     :   0.303032 ±   0.002083
-Mean PPL(Q)/PPL(base)         :   1.353958 ±   0.002821
-Mean PPL(Q)-PPL(base)         :   2.561629 ±   0.023725
 ====== KL divergence statistics ======
-Mean    KLD:   0.285340 ±   0.000933
-Maximum KLD:   8.015823
-99.9%   KLD:   3.395762
-99.0%   KLD:   1.885579
-99.0%   KLD:   1.885579
-Median  KLD:   0.207489
-10.0%   KLD:   0.018084
- 5.0%   KLD:   0.005005
- 1.0%   KLD:   0.000535
-Minimum KLD:   0.000000
 ====== Token probability statistics ======
-Mean    Δp: -7.606 ± 0.042 %
-Maximum Δp: 79.040%
-99.9%   Δp: 41.873%
-99.0%   Δp: 22.236%
-95.0%   Δp:  8.510%
-90.0%   Δp:  3.272%
-75.0%   Δp:  0.019%
-Median  Δp: -1.956%
-25.0%   Δp: -11.831%
-10.0%   Δp: -26.778%
- 5.0%   Δp: -39.549%
- 1.0%   Δp: -72.463%
- 0.1%   Δp: -89.273%
-Minimum Δp: -99.255%
-RMS Δp    : 17.751 ± 0.067 %
-Same top p: 74.688 ± 0.115 %

 ====== Perplexity statistics ======
+Mean PPL(Q)                   :   8.869361 ±   0.056188
 Mean PPL(base)                :   7.237090 ±   0.045539
+Cor(ln(PPL(Q)), ln(PPL(base))):  96.40%
+Mean ln(PPL(Q)/PPL(base))     :   0.203384 ±   0.001694
+Mean PPL(Q)/PPL(base)         :   1.225543 ±   0.002076
+Mean PPL(Q)-PPL(base)         :   1.632272 ±   0.017247
 ====== KL divergence statistics ======
+Mean    KLD:   0.171689 ±   0.000675
+Maximum KLD:   8.647476
+99.9%   KLD:   3.093943
+99.0%   KLD:   1.167801
+99.0%   KLD:   1.167801
+Median  KLD:   0.116922
+10.0%   KLD:   0.009604
+ 5.0%   KLD:   0.003321
+ 1.0%   KLD:   0.000607
+Minimum KLD:   0.000004
 ====== Token probability statistics ======
+Mean    Δp: -5.020 ± 0.030 %
+Maximum Δp: 68.248%
+99.9%   Δp: 34.217%
+99.0%   Δp: 17.833%
+95.0%   Δp:  7.053%
+90.0%   Δp:  2.939%
+75.0%   Δp:  0.031%
+Median  Δp: -1.308%
+25.0%   Δp: -7.961%
+10.0%   Δp: -18.717%
+ 5.0%   Δp: -26.619%
+ 1.0%   Δp: -49.119%
+ 0.1%   Δp: -82.457%
+Minimum Δp: -99.092%
+RMS Δp    : 12.587 ± 0.055 %
+Same top p: 80.614 ± 0.104 %

scores/Watt-Tool-8B-q3_k_s.tqa CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
-Final result: 35.6707 +/- 2.6490
-Random chance: 28.6213 +/- 2.4995
-llama_perf_context_print:        load time =     315.60 ms
-llama_perf_context_print: prompt eval time =   83571.96 ms / 17852 tokens (    4.68 ms per token,   213.61 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   84826.79 ms / 17853 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
+Final result: 33.2000 +/- 1.7207
+Random chance: 19.8992 +/- 1.4588
+llama_perf_context_print:        load time =     290.87 ms
+llama_perf_context_print: prompt eval time =  172697.11 ms / 50072 tokens (    3.45 ms per token,   289.94 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  174519.26 ms / 50073 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q3_k_s.wng CHANGED Viewed

@@ -1,11 +1,11 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
-Final Winogrande score(750 tasks): 71.8667 +/- 1.6430
-llama_perf_context_print:        load time =     314.44 ms
-llama_perf_context_print: prompt eval time =   96371.78 ms / 22317 tokens (    4.32 ms per token,   231.57 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =   97549.79 ms / 22318 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
+Final Winogrande score(750 tasks): 73.6000 +/- 1.6106
+llama_perf_context_print:        load time =     290.68 ms
+llama_perf_context_print: prompt eval time =   75062.12 ms / 22192 tokens (    3.38 ms per token,   295.65 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =   75536.06 ms / 22193 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q4_k_m.arc CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M.gguf (version GGUF V3 (latest))
-Final result: 64.6586 +/- 1.7502
-Random chance: 25.0335 +/- 1.5861
-llama_perf_context_print:        load time =    2144.99 ms
-llama_perf_context_print: prompt eval time =  164022.15 ms / 37149 tokens (    4.42 ms per token,   226.49 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  166313.65 ms / 37150 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M.gguf (version GGUF V3 (latest))
+Final result: 65.7333 +/- 1.7342
+Random chance: 25.0083 +/- 1.5824
+llama_perf_context_print:        load time =    2082.07 ms
+llama_perf_context_print: prompt eval time =  124311.12 ms / 36600 tokens (    3.40 ms per token,   294.42 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  125214.84 ms / 36601 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q4_k_m.hsw CHANGED Viewed

@@ -1,12 +1,12 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M.gguf (version GGUF V3 (latest))
-750	75.33333333
-llama_perf_context_print:        load time =     293.62 ms
-llama_perf_context_print: prompt eval time =  433672.45 ms / 123896 tokens (    3.50 ms per token,   285.69 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  439340.46 ms / 123897 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M.gguf (version GGUF V3 (latest))
+750	77.73333333%	[74.6188%, 80.5653%]
+llama_perf_context_print:        load time =     309.44 ms
+llama_perf_context_print: prompt eval time =  431270.97 ms / 126448 tokens (    3.41 ms per token,   293.20 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  435567.98 ms / 126449 tokens
 ggml_metal_free: deallocating

scores/Watt-Tool-8B-q4_k_m.mmlu CHANGED Viewed

@@ -1,13 +1,13 @@
-build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M.gguf (version GGUF V3 (latest))
-Final result: 40.2667 +/- 1.7920
 Random chance: 25.0000 +/- 1.5822
-llama_perf_context_print:        load time =     297.85 ms
-llama_perf_context_print: prompt eval time =  262597.88 ms / 70659 tokens (    3.72 ms per token,   269.08 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
-llama_perf_context_print:       total time =  265393.62 ms / 70660 tokens
 ggml_metal_free: deallocating

+build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
 llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
 llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M.gguf (version GGUF V3 (latest))
+Final result: 39.4667 +/- 1.7860
 Random chance: 25.0000 +/- 1.5822
+llama_perf_context_print:        load time =     297.51 ms
+llama_perf_context_print: prompt eval time =  223210.15 ms / 67195 tokens (    3.32 ms per token,   301.04 tokens per second)
 llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+llama_perf_context_print:       total time =  224621.22 ms / 67196 tokens
 ggml_metal_free: deallocating