ubergarm commited on
Commit
52485c7
·
1 Parent(s): e1ce4cd

update readme

Browse files
Files changed (1) hide show
  1. README.md +19 -1
README.md CHANGED
@@ -97,7 +97,7 @@ numactl -N 0 -m 0 \
97
 
98
  These are probably the **best quants available in this size class** for `V3-0324`!
99
 
100
- [!][Benchmarks showing these quants are smaller in size yet similar in performance to the `Q8_0`](benchmarks-01.png "Benchmarks showing these quants are smaller in size yet similar in performance to the `Q8_0`")
101
 
102
  ubergarm made no sacrifices for token embedding, attention, dense
103
  layers, or shared experts. This is possible because `ik_llama.cpp` MLA
@@ -220,6 +220,10 @@ Final estimate: PPL = 3.4755 +/- 0.03305
220
 
221
  #### Quant Cookers Secret Recipe
222
 
 
 
 
 
223
  ```bash
224
  #!/usr/bin/env bash
225
 
@@ -284,8 +288,14 @@ custom=$(
284
  24
285
  ```
286
 
 
 
287
  #### Perplexity
288
 
 
 
 
 
289
  ```bash
290
  $ CUDA_VISIBLE_DEVICES="0," \
291
  ./build/bin/llama-perplexity \
@@ -701,8 +711,16 @@ llama_print_timings: total time = 2841519.57 ms / 287233 tokens
701
  Final estimate: PPL = 3.5614 +/- 0.02001
702
  ```
703
 
 
 
704
  #### Split
705
 
 
 
 
 
 
 
706
  ```bash
707
  $ ./build/bin/llama-gguf-split \
708
  --dry-run \
 
97
 
98
  These are probably the **best quants available in this size class** for `V3-0324`!
99
 
100
+ ![Benchmarks showing these quants are smaller in size yet similar in performance to the `Q8_0`](benchmarks-01.png "Benchmarks showing these quants are smaller in size yet similar in performance to the `Q8_0`")
101
 
102
  ubergarm made no sacrifices for token embedding, attention, dense
103
  layers, or shared experts. This is possible because `ik_llama.cpp` MLA
 
220
 
221
  #### Quant Cookers Secret Recipe
222
 
223
+ <details>
224
+
225
+ <summary>Secret Recipe</summary>
226
+
227
  ```bash
228
  #!/usr/bin/env bash
229
 
 
288
  24
289
  ```
290
 
291
+ </details>
292
+
293
  #### Perplexity
294
 
295
+ <details>
296
+
297
+ <summary>Perplexity Logs</summary>
298
+
299
  ```bash
300
  $ CUDA_VISIBLE_DEVICES="0," \
301
  ./build/bin/llama-perplexity \
 
711
  Final estimate: PPL = 3.5614 +/- 0.02001
712
  ```
713
 
714
+ </details>
715
+
716
  #### Split
717
 
718
+ <details>
719
+
720
+ <summary>Split GGUF</summary>
721
+
722
+ *TODO*: Add key value metadata information before publishing.
723
+
724
  ```bash
725
  $ ./build/bin/llama-gguf-split \
726
  --dry-run \