Update README.md
Browse files
README.md
CHANGED
@@ -37,4 +37,54 @@ exl3_3.07bpw-h6-custom
|
|
37 |
-- Perplexity: 3.935338
|
38 |
```
|
39 |
|
40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
-- Perplexity: 3.935338
|
38 |
```
|
39 |
|
40 |
+
Additional metrics via `eval/model_diff.py` courtesy of [turboderp](https://huggingface.co/turboderp):
|
41 |
+
|
42 |
+
"Plain" exl3_3.0bpw-h6 vs original bf16 weights
|
43 |
+
```
|
44 |
+
-- original perplexity: 1.76745635
|
45 |
+
-- original label in top-K:
|
46 |
+
K = 1: 0.8681
|
47 |
+
K = 2: 0.9237
|
48 |
+
K = 3: 0.9411
|
49 |
+
K = 4: 0.9502
|
50 |
+
K = 5: 0.9564
|
51 |
+
-- 3.0bpw-h6 perplexity: 2.14967564
|
52 |
+
-- 3.0bpw-h6 label in top-K:
|
53 |
+
K = 1: 0.8142
|
54 |
+
K = 2: 0.8949
|
55 |
+
K = 3: 0.9231
|
56 |
+
K = 4: 0.9368
|
57 |
+
K = 5: 0.9464
|
58 |
+
-- Top-K agreement, 3.0bpw-h6 vs original:
|
59 |
+
K = 1: 0.8820
|
60 |
+
K = 2: 0.5225
|
61 |
+
K = 3: 0.2585
|
62 |
+
K = 4: 0.1132
|
63 |
+
K = 5: 0.0491
|
64 |
+
-- KL divergence (3.0bpw-h6, original): 0.23334818
|
65 |
+
```
|
66 |
+
|
67 |
+
exl3_3.07bpw-h6-custom vs original bf16 weights
|
68 |
+
```
|
69 |
+
-- original perplexity: 1.76745635
|
70 |
+
-- original label in top-K:
|
71 |
+
K = 1: 0.8681
|
72 |
+
K = 2: 0.9237
|
73 |
+
K = 3: 0.9411
|
74 |
+
K = 4: 0.9502
|
75 |
+
K = 5: 0.9564
|
76 |
+
-- 3.07bpw-h6-custom perplexity: 2.03357968
|
77 |
+
-- 3.07bpw-h6-custom label in top-K:
|
78 |
+
K = 1: 0.8305
|
79 |
+
K = 2: 0.9021
|
80 |
+
K = 3: 0.9286
|
81 |
+
K = 4: 0.9416
|
82 |
+
K = 5: 0.9504
|
83 |
+
-- Top-K agreement, 3.07bpw-h6-custom vs original:
|
84 |
+
K = 1: 0.8981
|
85 |
+
K = 2: 0.5702
|
86 |
+
K = 3: 0.3027
|
87 |
+
K = 4: 0.1461
|
88 |
+
K = 5: 0.0691
|
89 |
+
-- KL divergence (3.07bpw-h6-custom, original): 0.17770892
|
90 |
+
```
|