Generate Perplexity, KLD, ARC, HellaSwag, MMLU, Truthful QA and WinoGrande scores
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- scores/Watt-Tool-8B-F16.arc +6 -6
- scores/Watt-Tool-8B-F16.hsw +5 -5
- scores/Watt-Tool-8B-F16.mmlu +5 -5
- scores/Watt-Tool-8B-F16.tqa +6 -6
- scores/Watt-Tool-8B-F16.wng +5 -5
- scores/Watt-Tool-8B-Q4_K_M-naive.arc +0 -13
- scores/Watt-Tool-8B-Q4_K_M-naive.hsw +0 -12
- scores/Watt-Tool-8B-Q4_K_M-naive.mmlu +0 -13
- scores/Watt-Tool-8B-Q4_K_M-naive.ppx +0 -37
- scores/Watt-Tool-8B-Q4_K_M-naive.tqa +0 -13
- scores/Watt-Tool-8B-Q4_K_M-naive.wng +0 -11
- scores/Watt-Tool-8B-iq3_m.arc +6 -6
- scores/Watt-Tool-8B-iq3_m.hsw +5 -5
- scores/Watt-Tool-8B-iq3_m.mmlu +5 -5
- scores/Watt-Tool-8B-iq3_m.ppx +30 -30
- scores/Watt-Tool-8B-iq3_m.tqa +6 -6
- scores/Watt-Tool-8B-iq3_m.wng +5 -5
- scores/Watt-Tool-8B-iq3_s.arc +6 -6
- scores/Watt-Tool-8B-iq3_s.hsw +5 -5
- scores/Watt-Tool-8B-iq3_s.mmlu +5 -5
- scores/Watt-Tool-8B-iq3_s.ppx +31 -31
- scores/Watt-Tool-8B-iq3_s.tqa +6 -6
- scores/Watt-Tool-8B-iq3_s.wng +5 -5
- scores/Watt-Tool-8B-iq4_nl.arc +6 -6
- scores/Watt-Tool-8B-iq4_nl.hsw +5 -5
- scores/Watt-Tool-8B-iq4_nl.mmlu +5 -5
- scores/Watt-Tool-8B-iq4_nl.ppx +31 -31
- scores/Watt-Tool-8B-iq4_nl.tqa +6 -6
- scores/Watt-Tool-8B-iq4_nl.wng +5 -5
- scores/Watt-Tool-8B-q3_k_l.arc +6 -6
- scores/Watt-Tool-8B-q3_k_l.hsw +5 -5
- scores/Watt-Tool-8B-q3_k_l.mmlu +5 -5
- scores/Watt-Tool-8B-q3_k_l.ppx +31 -31
- scores/Watt-Tool-8B-q3_k_l.tqa +6 -6
- scores/Watt-Tool-8B-q3_k_l.wng +5 -5
- scores/Watt-Tool-8B-q3_k_m.arc +6 -6
- scores/Watt-Tool-8B-q3_k_m.hsw +5 -5
- scores/Watt-Tool-8B-q3_k_m.mmlu +5 -5
- scores/Watt-Tool-8B-q3_k_m.ppx +31 -31
- scores/Watt-Tool-8B-q3_k_m.tqa +6 -6
- scores/Watt-Tool-8B-q3_k_m.wng +5 -5
- scores/Watt-Tool-8B-q3_k_s.arc +6 -6
- scores/Watt-Tool-8B-q3_k_s.hsw +5 -5
- scores/Watt-Tool-8B-q3_k_s.mmlu +5 -5
- scores/Watt-Tool-8B-q3_k_s.ppx +31 -31
- scores/Watt-Tool-8B-q3_k_s.tqa +6 -6
- scores/Watt-Tool-8B-q3_k_s.wng +5 -5
- scores/Watt-Tool-8B-q4_k_m.arc +6 -6
- scores/Watt-Tool-8B-q4_k_m.hsw +5 -5
- scores/Watt-Tool-8B-q4_k_m.mmlu +5 -5
scores/Watt-Tool-8B-F16.arc
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result: 65.
|
6 |
-
Random chance: 25.
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 65.8667 +/- 1.7325
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 7049.26 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 109446.86 ms / 36600 tokens ( 2.99 ms per token, 334.41 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 110483.23 ms / 36601 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-F16.hsw
CHANGED
@@ -1,12 +1,12 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
750
|
6 |
|
7 |
|
8 |
-
llama_perf_context_print: load time =
|
9 |
-
llama_perf_context_print: prompt eval time =
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
-
llama_perf_context_print: total time =
|
12 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
750 78.66666667% [75.5926%, 81.4486%]
|
6 |
|
7 |
|
8 |
+
llama_perf_context_print: load time = 580.08 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 381945.70 ms / 126448 tokens ( 3.02 ms per token, 331.06 tokens per second)
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 386591.06 ms / 126449 tokens
|
12 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-F16.mmlu
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 40.9333 +/- 1.7967
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 596.70 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 197375.99 ms / 67195 tokens ( 2.94 ms per token, 340.44 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 198932.74 ms / 67196 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-F16.tqa
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
-
Random chance:
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 32.9333 +/- 1.7172
|
6 |
+
Random chance: 19.8992 +/- 1.4588
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 624.82 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 153527.41 ms / 50072 tokens ( 3.07 ms per token, 326.14 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 155568.93 ms / 50073 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-F16.wng
CHANGED
@@ -1,11 +1,11 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final Winogrande score(750 tasks): 74.
|
6 |
|
7 |
-
llama_perf_context_print: load time =
|
8 |
-
llama_perf_context_print: prompt eval time =
|
9 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
-
llama_perf_context_print: total time =
|
11 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 36 key-value pairs and 292 tensors from ./Watt-Tool-8B-F16.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final Winogrande score(750 tasks): 74.8000 +/- 1.5864
|
6 |
|
7 |
+
llama_perf_context_print: load time = 624.64 ms
|
8 |
+
llama_perf_context_print: prompt eval time = 66689.29 ms / 22192 tokens ( 3.01 ms per token, 332.77 tokens per second)
|
9 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
+
llama_perf_context_print: total time = 67279.36 ms / 22193 tokens
|
11 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-Q4_K_M-naive.arc
DELETED
@@ -1,13 +0,0 @@
|
|
1 |
-
build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
|
2 |
-
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
-
llama_model_loader: loaded meta data with 42 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M-naive.gguf (version GGUF V3 (latest))
|
4 |
-
|
5 |
-
Final result: 62.5668 +/- 1.7707
|
6 |
-
Random chance: 25.0251 +/- 1.5848
|
7 |
-
|
8 |
-
|
9 |
-
llama_perf_context_print: load time = 707.57 ms
|
10 |
-
llama_perf_context_print: prompt eval time = 164606.88 ms / 36539 tokens ( 4.50 ms per token, 221.98 tokens per second)
|
11 |
-
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time = 166874.76 ms / 36540 tokens
|
13 |
-
ggml_metal_free: deallocating
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scores/Watt-Tool-8B-Q4_K_M-naive.hsw
DELETED
@@ -1,12 +0,0 @@
|
|
1 |
-
build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
|
2 |
-
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
-
llama_model_loader: loaded meta data with 42 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M-naive.gguf (version GGUF V3 (latest))
|
4 |
-
|
5 |
-
750 77.73333333
|
6 |
-
|
7 |
-
|
8 |
-
llama_perf_context_print: load time = 306.76 ms
|
9 |
-
llama_perf_context_print: prompt eval time = 436291.37 ms / 122836 tokens ( 3.55 ms per token, 281.55 tokens per second)
|
10 |
-
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
-
llama_perf_context_print: total time = 441964.91 ms / 122837 tokens
|
12 |
-
ggml_metal_free: deallocating
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scores/Watt-Tool-8B-Q4_K_M-naive.mmlu
DELETED
@@ -1,13 +0,0 @@
|
|
1 |
-
build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
|
2 |
-
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
-
llama_model_loader: loaded meta data with 42 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M-naive.gguf (version GGUF V3 (latest))
|
4 |
-
|
5 |
-
Final result: 42.0000 +/- 1.8034
|
6 |
-
Random chance: 25.0000 +/- 1.5822
|
7 |
-
|
8 |
-
|
9 |
-
llama_perf_context_print: load time = 304.34 ms
|
10 |
-
llama_perf_context_print: prompt eval time = 262641.92 ms / 69673 tokens ( 3.77 ms per token, 265.28 tokens per second)
|
11 |
-
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time = 265464.52 ms / 69674 tokens
|
13 |
-
ggml_metal_free: deallocating
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scores/Watt-Tool-8B-Q4_K_M-naive.ppx
DELETED
@@ -1,37 +0,0 @@
|
|
1 |
-
====== Perplexity statistics ======
|
2 |
-
Mean PPL(Q) : 7.409510 ± 0.046740
|
3 |
-
Mean PPL(base) : 7.237090 ± 0.045539
|
4 |
-
Cor(ln(PPL(Q)), ln(PPL(base))): 99.65%
|
5 |
-
Mean ln(PPL(Q)/PPL(base)) : 0.023545 ± 0.000530
|
6 |
-
Mean PPL(Q)/PPL(base) : 1.023825 ± 0.000543
|
7 |
-
Mean PPL(Q)-PPL(base) : 0.172420 ± 0.004061
|
8 |
-
|
9 |
-
====== KL divergence statistics ======
|
10 |
-
Mean KLD: 0.017663 ± 0.000107
|
11 |
-
Maximum KLD: 5.749704
|
12 |
-
99.9% KLD: 0.447724
|
13 |
-
99.0% KLD: 0.139140
|
14 |
-
99.0% KLD: 0.139140
|
15 |
-
Median KLD: 0.010320
|
16 |
-
10.0% KLD: 0.000617
|
17 |
-
5.0% KLD: 0.000201
|
18 |
-
1.0% KLD: 0.000027
|
19 |
-
Minimum KLD: -0.000129
|
20 |
-
|
21 |
-
====== Token probability statistics ======
|
22 |
-
Mean Δp: -0.531 ± 0.010 %
|
23 |
-
Maximum Δp: 55.716%
|
24 |
-
99.9% Δp: 17.458%
|
25 |
-
99.0% Δp: 8.256%
|
26 |
-
95.0% Δp: 3.790%
|
27 |
-
90.0% Δp: 2.138%
|
28 |
-
75.0% Δp: 0.367%
|
29 |
-
Median Δp: -0.034%
|
30 |
-
25.0% Δp: -1.129%
|
31 |
-
10.0% Δp: -3.654%
|
32 |
-
5.0% Δp: -5.855%
|
33 |
-
1.0% Δp: -12.744%
|
34 |
-
0.1% Δp: -31.910%
|
35 |
-
Minimum Δp: -99.362%
|
36 |
-
RMS Δp : 3.658 ± 0.032 %
|
37 |
-
Same top p: 93.743 ± 0.064 %
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scores/Watt-Tool-8B-Q4_K_M-naive.tqa
DELETED
@@ -1,13 +0,0 @@
|
|
1 |
-
build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
|
2 |
-
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
-
llama_model_loader: loaded meta data with 42 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M-naive.gguf (version GGUF V3 (latest))
|
4 |
-
|
5 |
-
Final result: 36.8098 +/- 2.6753
|
6 |
-
Random chance: 28.5214 +/- 2.5046
|
7 |
-
|
8 |
-
|
9 |
-
llama_perf_context_print: load time = 306.51 ms
|
10 |
-
llama_perf_context_print: prompt eval time = 78347.98 ms / 17655 tokens ( 4.44 ms per token, 225.34 tokens per second)
|
11 |
-
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time = 79593.31 ms / 17656 tokens
|
13 |
-
ggml_metal_free: deallocating
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scores/Watt-Tool-8B-Q4_K_M-naive.wng
DELETED
@@ -1,11 +0,0 @@
|
|
1 |
-
build: 4945 (e354bc3b) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.3.0
|
2 |
-
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
-
llama_model_loader: loaded meta data with 42 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M-naive.gguf (version GGUF V3 (latest))
|
4 |
-
|
5 |
-
Final Winogrande score(750 tasks): 73.6000 +/- 1.6106
|
6 |
-
|
7 |
-
llama_perf_context_print: load time = 295.82 ms
|
8 |
-
llama_perf_context_print: prompt eval time = 90900.17 ms / 22246 tokens ( 4.09 ms per token, 244.73 tokens per second)
|
9 |
-
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
-
llama_perf_context_print: total time = 92103.74 ms / 22247 tokens
|
11 |
-
ggml_metal_free: deallocating
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scores/Watt-Tool-8B-iq3_m.arc
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
-
Random chance: 25.
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 62.8000 +/- 1.7661
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 1734.01 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 115043.84 ms / 36600 tokens ( 3.14 ms per token, 318.14 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 115943.71 ms / 36601 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-iq3_m.hsw
CHANGED
@@ -1,12 +1,12 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
750 78.
|
6 |
|
7 |
|
8 |
-
llama_perf_context_print: load time =
|
9 |
-
llama_perf_context_print: prompt eval time =
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
-
llama_perf_context_print: total time =
|
12 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
750 78.00000000% [74.8968%, 80.8179%]
|
6 |
|
7 |
|
8 |
+
llama_perf_context_print: load time = 291.01 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 400031.71 ms / 126448 tokens ( 3.16 ms per token, 316.09 tokens per second)
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 404305.75 ms / 126449 tokens
|
12 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-iq3_m.mmlu
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 37.7333 +/- 1.7711
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 289.89 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 206632.46 ms / 67195 tokens ( 3.08 ms per token, 325.19 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 208049.33 ms / 67196 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-iq3_m.ppx
CHANGED
@@ -1,37 +1,37 @@
|
|
1 |
====== Perplexity statistics ======
|
2 |
-
Mean PPL(Q) :
|
3 |
Mean PPL(base) : 7.237090 ± 0.045539
|
4 |
-
Cor(ln(PPL(Q)), ln(PPL(base))):
|
5 |
-
Mean ln(PPL(Q)/PPL(base)) : 0.
|
6 |
-
Mean PPL(Q)/PPL(base) : 1.
|
7 |
-
Mean PPL(Q)-PPL(base) :
|
8 |
|
9 |
====== KL divergence statistics ======
|
10 |
-
Mean KLD: 0.
|
11 |
-
Maximum KLD:
|
12 |
-
99.9% KLD:
|
13 |
-
99.0% KLD:
|
14 |
-
99.0% KLD:
|
15 |
-
Median KLD: 0.
|
16 |
-
10.0% KLD: 0.
|
17 |
-
5.0% KLD: 0.
|
18 |
-
1.0% KLD: 0.
|
19 |
Minimum KLD: 0.000000
|
20 |
|
21 |
====== Token probability statistics ======
|
22 |
-
Mean Δp: -
|
23 |
-
Maximum Δp:
|
24 |
-
99.9% Δp:
|
25 |
-
99.0% Δp:
|
26 |
-
95.0% Δp:
|
27 |
-
90.0% Δp:
|
28 |
-
75.0% Δp: 0.
|
29 |
-
Median Δp: -0.
|
30 |
-
25.0% Δp: -
|
31 |
-
10.0% Δp: -
|
32 |
-
5.0% Δp: -
|
33 |
-
1.0% Δp: -
|
34 |
-
0.1% Δp: -
|
35 |
-
Minimum Δp: -
|
36 |
-
RMS Δp :
|
37 |
-
Same top p:
|
|
|
1 |
====== Perplexity statistics ======
|
2 |
+
Mean PPL(Q) : 7.841948 ± 0.049502
|
3 |
Mean PPL(base) : 7.237090 ± 0.045539
|
4 |
+
Cor(ln(PPL(Q)), ln(PPL(base))): 98.36%
|
5 |
+
Mean ln(PPL(Q)/PPL(base)) : 0.080268 ± 0.001143
|
6 |
+
Mean PPL(Q)/PPL(base) : 1.083578 ± 0.001238
|
7 |
+
Mean PPL(Q)-PPL(base) : 0.604858 ± 0.009476
|
8 |
|
9 |
====== KL divergence statistics ======
|
10 |
+
Mean KLD: 0.081774 ± 0.000354
|
11 |
+
Maximum KLD: 7.690053
|
12 |
+
99.9% KLD: 1.654508
|
13 |
+
99.0% KLD: 0.555790
|
14 |
+
99.0% KLD: 0.555790
|
15 |
+
Median KLD: 0.056256
|
16 |
+
10.0% KLD: 0.003426
|
17 |
+
5.0% KLD: 0.001063
|
18 |
+
1.0% KLD: 0.000157
|
19 |
Minimum KLD: 0.000000
|
20 |
|
21 |
====== Token probability statistics ======
|
22 |
+
Mean Δp: -2.133 ± 0.021 %
|
23 |
+
Maximum Δp: 73.495%
|
24 |
+
99.9% Δp: 32.336%
|
25 |
+
99.0% Δp: 17.093%
|
26 |
+
95.0% Δp: 7.846%
|
27 |
+
90.0% Δp: 4.045%
|
28 |
+
75.0% Δp: 0.372%
|
29 |
+
Median Δp: -0.301%
|
30 |
+
25.0% Δp: -3.967%
|
31 |
+
10.0% Δp: -10.805%
|
32 |
+
5.0% Δp: -16.007%
|
33 |
+
1.0% Δp: -30.015%
|
34 |
+
0.1% Δp: -62.256%
|
35 |
+
Minimum Δp: -96.763%
|
36 |
+
RMS Δp : 8.316 ± 0.043 %
|
37 |
+
Same top p: 85.224 ± 0.094 %
|
scores/Watt-Tool-8B-iq3_m.tqa
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
-
Random chance:
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 32.1333 +/- 1.7063
|
6 |
+
Random chance: 19.8992 +/- 1.4588
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 288.09 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 161368.09 ms / 50072 tokens ( 3.22 ms per token, 310.30 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 163199.21 ms / 50073 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-iq3_m.wng
CHANGED
@@ -1,11 +1,11 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final Winogrande score(750 tasks):
|
6 |
|
7 |
-
llama_perf_context_print: load time =
|
8 |
-
llama_perf_context_print: prompt eval time =
|
9 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
-
llama_perf_context_print: total time =
|
11 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final Winogrande score(750 tasks): 73.6000 +/- 1.6106
|
6 |
|
7 |
+
llama_perf_context_print: load time = 288.50 ms
|
8 |
+
llama_perf_context_print: prompt eval time = 70143.95 ms / 22192 tokens ( 3.16 ms per token, 316.38 tokens per second)
|
9 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
+
llama_perf_context_print: total time = 70631.06 ms / 22193 tokens
|
11 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-iq3_s.arc
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
-
Random chance: 25.
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 62.0000 +/- 1.7736
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 1662.56 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 115280.42 ms / 36600 tokens ( 3.15 ms per token, 317.49 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 116185.54 ms / 36601 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-iq3_s.hsw
CHANGED
@@ -1,12 +1,12 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
750
|
6 |
|
7 |
|
8 |
-
llama_perf_context_print: load time =
|
9 |
-
llama_perf_context_print: prompt eval time =
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
-
llama_perf_context_print: total time =
|
12 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
750 76.26666667% [73.0928%, 79.1728%]
|
6 |
|
7 |
|
8 |
+
llama_perf_context_print: load time = 286.42 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 400735.90 ms / 126448 tokens ( 3.17 ms per token, 315.54 tokens per second)
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 405023.75 ms / 126449 tokens
|
12 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-iq3_s.mmlu
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 37.3333 +/- 1.7674
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 293.04 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 207055.09 ms / 67195 tokens ( 3.08 ms per token, 324.53 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 208462.26 ms / 67196 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-iq3_s.ppx
CHANGED
@@ -1,37 +1,37 @@
|
|
1 |
====== Perplexity statistics ======
|
2 |
-
Mean PPL(Q) :
|
3 |
Mean PPL(base) : 7.237090 ± 0.045539
|
4 |
-
Cor(ln(PPL(Q)), ln(PPL(base))):
|
5 |
-
Mean ln(PPL(Q)/PPL(base)) : 0.
|
6 |
-
Mean PPL(Q)/PPL(base) : 1.
|
7 |
-
Mean PPL(Q)-PPL(base) : 1.
|
8 |
|
9 |
====== KL divergence statistics ======
|
10 |
-
Mean KLD: 0.
|
11 |
-
Maximum KLD:
|
12 |
-
99.9% KLD:
|
13 |
-
99.0% KLD:
|
14 |
-
99.0% KLD:
|
15 |
-
Median KLD: 0.
|
16 |
-
10.0% KLD: 0.
|
17 |
-
5.0% KLD: 0.
|
18 |
-
1.0% KLD: 0.
|
19 |
-
Minimum KLD:
|
20 |
|
21 |
====== Token probability statistics ======
|
22 |
-
Mean Δp: -
|
23 |
-
Maximum Δp:
|
24 |
-
99.9% Δp:
|
25 |
-
99.0% Δp:
|
26 |
-
95.0% Δp:
|
27 |
-
90.0% Δp:
|
28 |
-
75.0% Δp: 0.
|
29 |
-
Median Δp: -
|
30 |
-
25.0% Δp: -
|
31 |
-
10.0% Δp: -
|
32 |
-
5.0% Δp: -
|
33 |
-
1.0% Δp: -
|
34 |
-
0.1% Δp: -
|
35 |
-
Minimum Δp: -
|
36 |
-
RMS Δp :
|
37 |
-
Same top p:
|
|
|
1 |
====== Perplexity statistics ======
|
2 |
+
Mean PPL(Q) : 8.253598 ± 0.051864
|
3 |
Mean PPL(base) : 7.237090 ± 0.045539
|
4 |
+
Cor(ln(PPL(Q)), ln(PPL(base))): 97.71%
|
5 |
+
Mean ln(PPL(Q)/PPL(base)) : 0.131430 ± 0.001346
|
6 |
+
Mean PPL(Q)/PPL(base) : 1.140458 ± 0.001535
|
7 |
+
Mean PPL(Q)-PPL(base) : 1.016508 ± 0.012175
|
8 |
|
9 |
====== KL divergence statistics ======
|
10 |
+
Mean KLD: 0.117565 ± 0.000433
|
11 |
+
Maximum KLD: 7.079286
|
12 |
+
99.9% KLD: 1.966468
|
13 |
+
99.0% KLD: 0.726076
|
14 |
+
99.0% KLD: 0.726076
|
15 |
+
Median KLD: 0.084699
|
16 |
+
10.0% KLD: 0.006988
|
17 |
+
5.0% KLD: 0.002383
|
18 |
+
1.0% KLD: 0.000330
|
19 |
+
Minimum KLD: -0.000001
|
20 |
|
21 |
====== Token probability statistics ======
|
22 |
+
Mean Δp: -3.685 ± 0.026 %
|
23 |
+
Maximum Δp: 69.513%
|
24 |
+
99.9% Δp: 34.570%
|
25 |
+
99.0% Δp: 17.585%
|
26 |
+
95.0% Δp: 7.273%
|
27 |
+
90.0% Δp: 3.369%
|
28 |
+
75.0% Δp: 0.113%
|
29 |
+
Median Δp: -0.833%
|
30 |
+
25.0% Δp: -6.212%
|
31 |
+
10.0% Δp: -15.079%
|
32 |
+
5.0% Δp: -21.666%
|
33 |
+
1.0% Δp: -38.754%
|
34 |
+
0.1% Δp: -69.188%
|
35 |
+
Minimum Δp: -97.122%
|
36 |
+
RMS Δp : 10.385 ± 0.045 %
|
37 |
+
Same top p: 82.770 ± 0.100 %
|
scores/Watt-Tool-8B-iq3_s.tqa
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
-
Random chance:
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 30.4000 +/- 1.6807
|
6 |
+
Random chance: 19.8992 +/- 1.4588
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 284.74 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 161670.39 ms / 50072 tokens ( 3.23 ms per token, 309.72 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 163511.55 ms / 50073 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-iq3_s.wng
CHANGED
@@ -1,11 +1,11 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final Winogrande score(750 tasks):
|
6 |
|
7 |
-
llama_perf_context_print: load time =
|
8 |
-
llama_perf_context_print: prompt eval time =
|
9 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
-
llama_perf_context_print: total time =
|
11 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ3_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final Winogrande score(750 tasks): 72.9333 +/- 1.6235
|
6 |
|
7 |
+
llama_perf_context_print: load time = 291.13 ms
|
8 |
+
llama_perf_context_print: prompt eval time = 70279.68 ms / 22192 tokens ( 3.17 ms per token, 315.77 tokens per second)
|
9 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
+
llama_perf_context_print: total time = 70763.82 ms / 22193 tokens
|
11 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-iq4_nl.arc
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
-
Random chance: 25.
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 63.4667 +/- 1.7594
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 2048.98 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 119074.35 ms / 36600 tokens ( 3.25 ms per token, 307.37 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 119972.06 ms / 36601 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-iq4_nl.hsw
CHANGED
@@ -1,12 +1,12 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
750 77.
|
6 |
|
7 |
|
8 |
-
llama_perf_context_print: load time =
|
9 |
-
llama_perf_context_print: prompt eval time =
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
-
llama_perf_context_print: total time =
|
12 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
750 77.73333333% [74.6188%, 80.5653%]
|
6 |
|
7 |
|
8 |
+
llama_perf_context_print: load time = 297.65 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 413170.51 ms / 126448 tokens ( 3.27 ms per token, 306.04 tokens per second)
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 417435.80 ms / 126449 tokens
|
12 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-iq4_nl.mmlu
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result: 39.
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 39.6000 +/- 1.7870
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 304.48 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 213794.40 ms / 67195 tokens ( 3.18 ms per token, 314.30 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 215197.82 ms / 67196 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-iq4_nl.ppx
CHANGED
@@ -1,37 +1,37 @@
|
|
1 |
====== Perplexity statistics ======
|
2 |
-
Mean PPL(Q) : 7.
|
3 |
Mean PPL(base) : 7.237090 ± 0.045539
|
4 |
-
Cor(ln(PPL(Q)), ln(PPL(base))):
|
5 |
-
Mean ln(PPL(Q)/PPL(base)) : 0.
|
6 |
-
Mean PPL(Q)/PPL(base) : 1.
|
7 |
-
Mean PPL(Q)-PPL(base) : 0.
|
8 |
|
9 |
====== KL divergence statistics ======
|
10 |
-
Mean KLD: 0.
|
11 |
-
Maximum KLD:
|
12 |
-
99.9% KLD:
|
13 |
-
99.0% KLD: 0.
|
14 |
-
99.0% KLD: 0.
|
15 |
-
Median KLD: 0.
|
16 |
-
10.0% KLD: 0.
|
17 |
-
5.0% KLD: 0.
|
18 |
-
1.0% KLD: 0.
|
19 |
-
Minimum KLD: -0.
|
20 |
|
21 |
====== Token probability statistics ======
|
22 |
-
Mean Δp:
|
23 |
-
Maximum Δp:
|
24 |
-
99.9% Δp:
|
25 |
-
99.0% Δp:
|
26 |
-
95.0% Δp:
|
27 |
-
90.0% Δp:
|
28 |
-
75.0% Δp:
|
29 |
-
Median Δp:
|
30 |
-
25.0% Δp: -
|
31 |
-
10.0% Δp: -6.
|
32 |
-
5.0% Δp: -
|
33 |
-
1.0% Δp: -
|
34 |
-
0.1% Δp: -
|
35 |
-
Minimum Δp: -
|
36 |
-
RMS Δp :
|
37 |
-
Same top p:
|
|
|
1 |
====== Perplexity statistics ======
|
2 |
+
Mean PPL(Q) : 7.516430 ± 0.047275
|
3 |
Mean PPL(base) : 7.237090 ± 0.045539
|
4 |
+
Cor(ln(PPL(Q)), ln(PPL(base))): 99.30%
|
5 |
+
Mean ln(PPL(Q)/PPL(base)) : 0.037872 ± 0.000742
|
6 |
+
Mean PPL(Q)/PPL(base) : 1.038599 ± 0.000771
|
7 |
+
Mean PPL(Q)-PPL(base) : 0.279341 ± 0.005741
|
8 |
|
9 |
====== KL divergence statistics ======
|
10 |
+
Mean KLD: 0.034545 ± 0.000172
|
11 |
+
Maximum KLD: 4.479205
|
12 |
+
99.9% KLD: 0.809954
|
13 |
+
99.0% KLD: 0.243338
|
14 |
+
99.0% KLD: 0.243338
|
15 |
+
Median KLD: 0.022288
|
16 |
+
10.0% KLD: 0.001467
|
17 |
+
5.0% KLD: 0.000485
|
18 |
+
1.0% KLD: 0.000067
|
19 |
+
Minimum KLD: -0.000025
|
20 |
|
21 |
====== Token probability statistics ======
|
22 |
+
Mean Δp: -0.984 ± 0.014 %
|
23 |
+
Maximum Δp: 60.862%
|
24 |
+
99.9% Δp: 23.910%
|
25 |
+
99.0% Δp: 11.795%
|
26 |
+
95.0% Δp: 5.549%
|
27 |
+
90.0% Δp: 3.027%
|
28 |
+
75.0% Δp: 0.404%
|
29 |
+
Median Δp: -0.103%
|
30 |
+
25.0% Δp: -2.015%
|
31 |
+
10.0% Δp: -6.046%
|
32 |
+
5.0% Δp: -9.267%
|
33 |
+
1.0% Δp: -18.431%
|
34 |
+
0.1% Δp: -42.825%
|
35 |
+
Minimum Δp: -93.056%
|
36 |
+
RMS Δp : 5.270 ± 0.035 %
|
37 |
+
Same top p: 90.812 ± 0.076 %
|
scores/Watt-Tool-8B-iq4_nl.tqa
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
-
Random chance:
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 31.4667 +/- 1.6968
|
6 |
+
Random chance: 19.8992 +/- 1.4588
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 311.22 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 166835.67 ms / 50072 tokens ( 3.33 ms per token, 300.13 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 168685.42 ms / 50073 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-iq4_nl.wng
CHANGED
@@ -1,11 +1,11 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final Winogrande score(750 tasks):
|
6 |
|
7 |
-
llama_perf_context_print: load time = 302.
|
8 |
-
llama_perf_context_print: prompt eval time =
|
9 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
-
llama_perf_context_print: total time =
|
11 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-IQ4_NL.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final Winogrande score(750 tasks): 75.4667 +/- 1.5722
|
6 |
|
7 |
+
llama_perf_context_print: load time = 302.51 ms
|
8 |
+
llama_perf_context_print: prompt eval time = 72558.29 ms / 22192 tokens ( 3.27 ms per token, 305.85 tokens per second)
|
9 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
+
llama_perf_context_print: total time = 73039.49 ms / 22193 tokens
|
11 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_l.arc
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
-
Random chance: 25.
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 61.7333 +/- 1.7759
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 1803.05 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 123314.10 ms / 36600 tokens ( 3.37 ms per token, 296.80 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 124217.22 ms / 36601 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_l.hsw
CHANGED
@@ -1,12 +1,12 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
750 74.
|
6 |
|
7 |
|
8 |
-
llama_perf_context_print: load time =
|
9 |
-
llama_perf_context_print: prompt eval time =
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
-
llama_perf_context_print: total time =
|
12 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
750 77.20000000% [74.0633%, 80.0595%]
|
6 |
|
7 |
|
8 |
+
llama_perf_context_print: load time = 293.80 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 428872.85 ms / 126448 tokens ( 3.39 ms per token, 294.84 tokens per second)
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 433161.99 ms / 126449 tokens
|
12 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_l.mmlu
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 38.5333 +/- 1.7783
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 293.76 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 221450.17 ms / 67195 tokens ( 3.30 ms per token, 303.43 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 222861.86 ms / 67196 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_l.ppx
CHANGED
@@ -1,37 +1,37 @@
|
|
1 |
====== Perplexity statistics ======
|
2 |
-
Mean PPL(Q) :
|
3 |
Mean PPL(base) : 7.237090 ± 0.045539
|
4 |
-
Cor(ln(PPL(Q)), ln(PPL(base))):
|
5 |
-
Mean ln(PPL(Q)/PPL(base)) : 0.
|
6 |
-
Mean PPL(Q)/PPL(base) : 1.
|
7 |
-
Mean PPL(Q)-PPL(base) :
|
8 |
|
9 |
====== KL divergence statistics ======
|
10 |
-
Mean KLD: 0.
|
11 |
-
Maximum KLD:
|
12 |
-
99.9% KLD:
|
13 |
-
99.0% KLD:
|
14 |
-
99.0% KLD:
|
15 |
-
Median KLD: 0.
|
16 |
-
10.0% KLD: 0.
|
17 |
-
5.0% KLD: 0.
|
18 |
-
1.0% KLD: 0.
|
19 |
-
Minimum KLD:
|
20 |
|
21 |
====== Token probability statistics ======
|
22 |
-
Mean Δp: -
|
23 |
-
Maximum Δp:
|
24 |
-
99.9% Δp:
|
25 |
-
99.0% Δp:
|
26 |
-
95.0% Δp: 7.
|
27 |
-
90.0% Δp:
|
28 |
-
75.0% Δp: 0.
|
29 |
-
Median Δp: -
|
30 |
-
25.0% Δp: -
|
31 |
-
10.0% Δp: -
|
32 |
-
5.0% Δp: -
|
33 |
-
1.0% Δp: -
|
34 |
-
0.1% Δp: -
|
35 |
-
Minimum Δp: -
|
36 |
-
RMS Δp :
|
37 |
-
Same top p:
|
|
|
1 |
====== Perplexity statistics ======
|
2 |
+
Mean PPL(Q) : 8.274172 ± 0.052402
|
3 |
Mean PPL(base) : 7.237090 ± 0.045539
|
4 |
+
Cor(ln(PPL(Q)), ln(PPL(base))): 97.60%
|
5 |
+
Mean ln(PPL(Q)/PPL(base)) : 0.133920 ± 0.001382
|
6 |
+
Mean PPL(Q)/PPL(base) : 1.143301 ± 0.001580
|
7 |
+
Mean PPL(Q)-PPL(base) : 1.037082 ± 0.012706
|
8 |
|
9 |
====== KL divergence statistics ======
|
10 |
+
Mean KLD: 0.114738 ± 0.000483
|
11 |
+
Maximum KLD: 9.999102
|
12 |
+
99.9% KLD: 2.236693
|
13 |
+
99.0% KLD: 0.781076
|
14 |
+
99.0% KLD: 0.781076
|
15 |
+
Median KLD: 0.077728
|
16 |
+
10.0% KLD: 0.005170
|
17 |
+
5.0% KLD: 0.001727
|
18 |
+
1.0% KLD: 0.000289
|
19 |
+
Minimum KLD: -0.000055
|
20 |
|
21 |
====== Token probability statistics ======
|
22 |
+
Mean Δp: -3.288 ± 0.025 %
|
23 |
+
Maximum Δp: 65.548%
|
24 |
+
99.9% Δp: 32.662%
|
25 |
+
99.0% Δp: 17.193%
|
26 |
+
95.0% Δp: 7.509%
|
27 |
+
90.0% Δp: 3.610%
|
28 |
+
75.0% Δp: 0.176%
|
29 |
+
Median Δp: -0.636%
|
30 |
+
25.0% Δp: -5.421%
|
31 |
+
10.0% Δp: -13.956%
|
32 |
+
5.0% Δp: -20.435%
|
33 |
+
1.0% Δp: -38.546%
|
34 |
+
0.1% Δp: -71.826%
|
35 |
+
Minimum Δp: -98.746%
|
36 |
+
RMS Δp : 10.050 ± 0.048 %
|
37 |
+
Same top p: 83.354 ± 0.098 %
|
scores/Watt-Tool-8B-q3_k_l.tqa
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
-
Random chance:
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 32.4000 +/- 1.7100
|
6 |
+
Random chance: 19.8992 +/- 1.4588
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 303.59 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 173066.86 ms / 50072 tokens ( 3.46 ms per token, 289.32 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 174906.65 ms / 50073 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_l.wng
CHANGED
@@ -1,11 +1,11 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final Winogrande score(750 tasks):
|
6 |
|
7 |
-
llama_perf_context_print: load time =
|
8 |
-
llama_perf_context_print: prompt eval time =
|
9 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
-
llama_perf_context_print: total time =
|
11 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_L.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final Winogrande score(750 tasks): 71.8667 +/- 1.6430
|
6 |
|
7 |
+
llama_perf_context_print: load time = 282.54 ms
|
8 |
+
llama_perf_context_print: prompt eval time = 75214.17 ms / 22192 tokens ( 3.39 ms per token, 295.05 tokens per second)
|
9 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
+
llama_perf_context_print: total time = 75700.49 ms / 22193 tokens
|
11 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_m.arc
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
-
Random chance: 25.
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 61.0667 +/- 1.7816
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 1677.48 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 120638.17 ms / 36600 tokens ( 3.30 ms per token, 303.39 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 121534.75 ms / 36601 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_m.hsw
CHANGED
@@ -1,12 +1,12 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
750
|
6 |
|
7 |
|
8 |
-
llama_perf_context_print: load time =
|
9 |
-
llama_perf_context_print: prompt eval time =
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
-
llama_perf_context_print: total time =
|
12 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
750 77.20000000% [74.0633%, 80.0595%]
|
6 |
|
7 |
|
8 |
+
llama_perf_context_print: load time = 283.76 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 419917.27 ms / 126448 tokens ( 3.32 ms per token, 301.13 tokens per second)
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 424216.79 ms / 126449 tokens
|
12 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_m.mmlu
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result: 38.
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 38.5333 +/- 1.7783
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 285.00 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 216659.18 ms / 67195 tokens ( 3.22 ms per token, 310.14 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 218063.84 ms / 67196 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_m.ppx
CHANGED
@@ -1,37 +1,37 @@
|
|
1 |
====== Perplexity statistics ======
|
2 |
-
Mean PPL(Q) :
|
3 |
Mean PPL(base) : 7.237090 ± 0.045539
|
4 |
-
Cor(ln(PPL(Q)), ln(PPL(base))):
|
5 |
-
Mean ln(PPL(Q)/PPL(base)) : 0.
|
6 |
-
Mean PPL(Q)/PPL(base) : 1.
|
7 |
-
Mean PPL(Q)-PPL(base) :
|
8 |
|
9 |
====== KL divergence statistics ======
|
10 |
-
Mean KLD: 0.
|
11 |
-
Maximum KLD:
|
12 |
-
99.9% KLD:
|
13 |
-
99.0% KLD:
|
14 |
-
99.0% KLD:
|
15 |
-
Median KLD: 0.
|
16 |
-
10.0% KLD: 0.
|
17 |
-
5.0% KLD: 0.
|
18 |
-
1.0% KLD: 0.
|
19 |
-
Minimum KLD: 0.
|
20 |
|
21 |
====== Token probability statistics ======
|
22 |
-
Mean Δp: -
|
23 |
-
Maximum Δp:
|
24 |
-
99.9% Δp:
|
25 |
-
99.0% Δp:
|
26 |
-
95.0% Δp:
|
27 |
-
90.0% Δp:
|
28 |
-
75.0% Δp: 0.
|
29 |
-
Median Δp: -
|
30 |
-
25.0% Δp: -
|
31 |
-
10.0% Δp: -
|
32 |
-
5.0% Δp: -
|
33 |
-
1.0% Δp: -
|
34 |
-
0.1% Δp: -
|
35 |
-
Minimum Δp: -
|
36 |
-
RMS Δp :
|
37 |
-
Same top p:
|
|
|
1 |
====== Perplexity statistics ======
|
2 |
+
Mean PPL(Q) : 8.459379 ± 0.053550
|
3 |
Mean PPL(base) : 7.237090 ± 0.045539
|
4 |
+
Cor(ln(PPL(Q)), ln(PPL(base))): 97.26%
|
5 |
+
Mean ln(PPL(Q)/PPL(base)) : 0.156057 ± 0.001477
|
6 |
+
Mean PPL(Q)/PPL(base) : 1.168892 ± 0.001727
|
7 |
+
Mean PPL(Q)-PPL(base) : 1.222289 ± 0.014061
|
8 |
|
9 |
====== KL divergence statistics ======
|
10 |
+
Mean KLD: 0.131196 ± 0.000539
|
11 |
+
Maximum KLD: 7.898368
|
12 |
+
99.9% KLD: 2.475934
|
13 |
+
99.0% KLD: 0.894390
|
14 |
+
99.0% KLD: 0.894390
|
15 |
+
Median KLD: 0.089346
|
16 |
+
10.0% KLD: 0.006670
|
17 |
+
5.0% KLD: 0.002250
|
18 |
+
1.0% KLD: 0.000392
|
19 |
+
Minimum KLD: 0.000001
|
20 |
|
21 |
====== Token probability statistics ======
|
22 |
+
Mean Δp: -3.913 ± 0.027 %
|
23 |
+
Maximum Δp: 64.023%
|
24 |
+
99.9% Δp: 33.075%
|
25 |
+
99.0% Δp: 17.214%
|
26 |
+
95.0% Δp: 7.245%
|
27 |
+
90.0% Δp: 3.301%
|
28 |
+
75.0% Δp: 0.096%
|
29 |
+
Median Δp: -0.875%
|
30 |
+
25.0% Δp: -6.342%
|
31 |
+
10.0% Δp: -15.578%
|
32 |
+
5.0% Δp: -22.649%
|
33 |
+
1.0% Δp: -41.943%
|
34 |
+
0.1% Δp: -75.852%
|
35 |
+
Minimum Δp: -98.926%
|
36 |
+
RMS Δp : 10.892 ± 0.050 %
|
37 |
+
Same top p: 82.438 ± 0.100 %
|
scores/Watt-Tool-8B-q3_k_m.tqa
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
-
Random chance:
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 33.3333 +/- 1.7225
|
6 |
+
Random chance: 19.8992 +/- 1.4588
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 294.56 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 169427.06 ms / 50072 tokens ( 3.38 ms per token, 295.54 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 171263.15 ms / 50073 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_m.wng
CHANGED
@@ -1,11 +1,11 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final Winogrande score(750 tasks):
|
6 |
|
7 |
-
llama_perf_context_print: load time =
|
8 |
-
llama_perf_context_print: prompt eval time =
|
9 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
-
llama_perf_context_print: total time =
|
11 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final Winogrande score(750 tasks): 73.0667 +/- 1.6209
|
6 |
|
7 |
+
llama_perf_context_print: load time = 286.93 ms
|
8 |
+
llama_perf_context_print: prompt eval time = 73604.03 ms / 22192 tokens ( 3.32 ms per token, 301.51 tokens per second)
|
9 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
+
llama_perf_context_print: total time = 74091.88 ms / 22193 tokens
|
11 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_s.arc
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
-
Random chance: 25.
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 58.2667 +/- 1.8018
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 1653.99 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 123080.79 ms / 36600 tokens ( 3.36 ms per token, 297.37 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 123982.14 ms / 36601 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_s.hsw
CHANGED
@@ -1,12 +1,12 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
750
|
6 |
|
7 |
|
8 |
-
llama_perf_context_print: load time =
|
9 |
-
llama_perf_context_print: prompt eval time =
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
-
llama_perf_context_print: total time =
|
12 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
750 75.60000000% [72.4008%, 78.5383%]
|
6 |
|
7 |
|
8 |
+
llama_perf_context_print: load time = 285.75 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 427986.37 ms / 126448 tokens ( 3.38 ms per token, 295.45 tokens per second)
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 432278.94 ms / 126449 tokens
|
12 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_s.mmlu
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 38.1333 +/- 1.7748
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 283.79 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 221006.99 ms / 67195 tokens ( 3.29 ms per token, 304.04 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 222418.23 ms / 67196 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_s.ppx
CHANGED
@@ -1,37 +1,37 @@
|
|
1 |
====== Perplexity statistics ======
|
2 |
-
Mean PPL(Q) :
|
3 |
Mean PPL(base) : 7.237090 ± 0.045539
|
4 |
-
Cor(ln(PPL(Q)), ln(PPL(base))):
|
5 |
-
Mean ln(PPL(Q)/PPL(base)) : 0.
|
6 |
-
Mean PPL(Q)/PPL(base) : 1.
|
7 |
-
Mean PPL(Q)-PPL(base) :
|
8 |
|
9 |
====== KL divergence statistics ======
|
10 |
-
Mean KLD: 0.
|
11 |
-
Maximum KLD: 8.
|
12 |
-
99.9% KLD: 3.
|
13 |
-
99.0% KLD: 1.
|
14 |
-
99.0% KLD: 1.
|
15 |
-
Median KLD: 0.
|
16 |
-
10.0% KLD: 0.
|
17 |
-
5.0% KLD: 0.
|
18 |
-
1.0% KLD: 0.
|
19 |
-
Minimum KLD: 0.
|
20 |
|
21 |
====== Token probability statistics ======
|
22 |
-
Mean Δp: -
|
23 |
-
Maximum Δp:
|
24 |
-
99.9% Δp:
|
25 |
-
99.0% Δp:
|
26 |
-
95.0% Δp:
|
27 |
-
90.0% Δp:
|
28 |
-
75.0% Δp: 0.
|
29 |
-
Median Δp: -1.
|
30 |
-
25.0% Δp: -
|
31 |
-
10.0% Δp: -
|
32 |
-
5.0% Δp: -
|
33 |
-
1.0% Δp: -
|
34 |
-
0.1% Δp: -
|
35 |
-
Minimum Δp: -99.
|
36 |
-
RMS Δp :
|
37 |
-
Same top p:
|
|
|
1 |
====== Perplexity statistics ======
|
2 |
+
Mean PPL(Q) : 8.869361 ± 0.056188
|
3 |
Mean PPL(base) : 7.237090 ± 0.045539
|
4 |
+
Cor(ln(PPL(Q)), ln(PPL(base))): 96.40%
|
5 |
+
Mean ln(PPL(Q)/PPL(base)) : 0.203384 ± 0.001694
|
6 |
+
Mean PPL(Q)/PPL(base) : 1.225543 ± 0.002076
|
7 |
+
Mean PPL(Q)-PPL(base) : 1.632272 ± 0.017247
|
8 |
|
9 |
====== KL divergence statistics ======
|
10 |
+
Mean KLD: 0.171689 ± 0.000675
|
11 |
+
Maximum KLD: 8.647476
|
12 |
+
99.9% KLD: 3.093943
|
13 |
+
99.0% KLD: 1.167801
|
14 |
+
99.0% KLD: 1.167801
|
15 |
+
Median KLD: 0.116922
|
16 |
+
10.0% KLD: 0.009604
|
17 |
+
5.0% KLD: 0.003321
|
18 |
+
1.0% KLD: 0.000607
|
19 |
+
Minimum KLD: 0.000004
|
20 |
|
21 |
====== Token probability statistics ======
|
22 |
+
Mean Δp: -5.020 ± 0.030 %
|
23 |
+
Maximum Δp: 68.248%
|
24 |
+
99.9% Δp: 34.217%
|
25 |
+
99.0% Δp: 17.833%
|
26 |
+
95.0% Δp: 7.053%
|
27 |
+
90.0% Δp: 2.939%
|
28 |
+
75.0% Δp: 0.031%
|
29 |
+
Median Δp: -1.308%
|
30 |
+
25.0% Δp: -7.961%
|
31 |
+
10.0% Δp: -18.717%
|
32 |
+
5.0% Δp: -26.619%
|
33 |
+
1.0% Δp: -49.119%
|
34 |
+
0.1% Δp: -82.457%
|
35 |
+
Minimum Δp: -99.092%
|
36 |
+
RMS Δp : 12.587 ± 0.055 %
|
37 |
+
Same top p: 80.614 ± 0.104 %
|
scores/Watt-Tool-8B-q3_k_s.tqa
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
-
Random chance:
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 33.2000 +/- 1.7207
|
6 |
+
Random chance: 19.8992 +/- 1.4588
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 290.87 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 172697.11 ms / 50072 tokens ( 3.45 ms per token, 289.94 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 174519.26 ms / 50073 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q3_k_s.wng
CHANGED
@@ -1,11 +1,11 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final Winogrande score(750 tasks):
|
6 |
|
7 |
-
llama_perf_context_print: load time =
|
8 |
-
llama_perf_context_print: prompt eval time =
|
9 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
-
llama_perf_context_print: total time =
|
11 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q3_K_S.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final Winogrande score(750 tasks): 73.6000 +/- 1.6106
|
6 |
|
7 |
+
llama_perf_context_print: load time = 290.68 ms
|
8 |
+
llama_perf_context_print: prompt eval time = 75062.12 ms / 22192 tokens ( 3.38 ms per token, 295.65 tokens per second)
|
9 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
10 |
+
llama_perf_context_print: total time = 75536.06 ms / 22193 tokens
|
11 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q4_k_m.arc
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
-
Random chance: 25.
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time =
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 65.7333 +/- 1.7342
|
6 |
+
Random chance: 25.0083 +/- 1.5824
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 2082.07 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 124311.12 ms / 36600 tokens ( 3.40 ms per token, 294.42 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 125214.84 ms / 36601 tokens
|
13 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q4_k_m.hsw
CHANGED
@@ -1,12 +1,12 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
750
|
6 |
|
7 |
|
8 |
-
llama_perf_context_print: load time =
|
9 |
-
llama_perf_context_print: prompt eval time =
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
-
llama_perf_context_print: total time =
|
12 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
750 77.73333333% [74.6188%, 80.5653%]
|
6 |
|
7 |
|
8 |
+
llama_perf_context_print: load time = 309.44 ms
|
9 |
+
llama_perf_context_print: prompt eval time = 431270.97 ms / 126448 tokens ( 3.41 ms per token, 293.20 tokens per second)
|
10 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
11 |
+
llama_perf_context_print: total time = 435567.98 ms / 126449 tokens
|
12 |
ggml_metal_free: deallocating
|
scores/Watt-Tool-8B-q4_k_m.mmlu
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
-
build:
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
-
Final result:
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
-
llama_perf_context_print: load time = 297.
|
10 |
-
llama_perf_context_print: prompt eval time =
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
-
llama_perf_context_print: total time =
|
13 |
ggml_metal_free: deallocating
|
|
|
1 |
+
build: 5150 (2db9ba14) with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
|
2 |
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 27647 MiB free
|
3 |
llama_model_loader: loaded meta data with 40 key-value pairs and 292 tensors from ./Watt-Tool-8B-Q4_K_M.gguf (version GGUF V3 (latest))
|
4 |
|
5 |
+
Final result: 39.4667 +/- 1.7860
|
6 |
Random chance: 25.0000 +/- 1.5822
|
7 |
|
8 |
|
9 |
+
llama_perf_context_print: load time = 297.51 ms
|
10 |
+
llama_perf_context_print: prompt eval time = 223210.15 ms / 67195 tokens ( 3.32 ms per token, 301.04 tokens per second)
|
11 |
llama_perf_context_print: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
12 |
+
llama_perf_context_print: total time = 224621.22 ms / 67196 tokens
|
13 |
ggml_metal_free: deallocating
|