DavidAU commited on
Commit
e0c4b4d
·
verified ·
1 Parent(s): bbba23a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -3
README.md CHANGED
@@ -22,11 +22,22 @@ base_model:
22
  - RekaAI/reka-flash-3
23
  pipeline_tag: text-generation
24
  ---
 
 
 
25
  <h2>Reka-Flash-3-21B-Reasoning-MAX-NEO-Imatrix-GGUF</h2>
26
 
27
  UPDATE: Re-optimizing quants, found a better mixture. Uploading NOW...
28
 
29
- Mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized
 
 
 
 
 
 
 
 
30
  may not be able to solve or take a lot longer (a lot more tokens!).
31
 
32
  Quick testing shows optimized quants can:
@@ -35,6 +46,10 @@ Quick testing shows optimized quants can:
35
  - Use less tokens to "reason" ... up to 50% less.
36
  - Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
37
 
 
 
 
 
38
  <B>Quants - "EDGE of REASON":</B>
39
 
40
  Generally higher quants will solve problems faster with less tokens, and be able to solve tougher problems.
@@ -77,6 +92,4 @@ Reka's excellent reasoning model with MAX (level 1) quants, and NEO Imatrix data
77
 
78
  Does support other languages besides English.
79
 
80
- Quants uploading, examples/repo card updates pending...
81
-
82
  ---
 
22
  - RekaAI/reka-flash-3
23
  pipeline_tag: text-generation
24
  ---
25
+
26
+ (Quants uploading, examples/repo card updates pending...)
27
+
28
  <h2>Reka-Flash-3-21B-Reasoning-MAX-NEO-Imatrix-GGUF</h2>
29
 
30
  UPDATE: Re-optimizing quants, found a better mixture. Uploading NOW...
31
 
32
+ This reasoning model seems to be able to solve problems, faster and more directly than other tested reasoning models.
33
+
34
+ It also rarely, get stuck in a loop or "lost in the woods."
35
+
36
+ This model is also unusually strong even at the smallest quant levels, and with augmentation now even stronger.
37
+
38
+ <B>Augmented Quants:</b>
39
+
40
+ Augmented quants mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized
41
  may not be able to solve or take a lot longer (a lot more tokens!).
42
 
43
  Quick testing shows optimized quants can:
 
46
  - Use less tokens to "reason" ... up to 50% less.
47
  - Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
48
 
49
+ Cost of the Augment:
50
+ - Quants are are slightly larger.
51
+ - Very small "hit" in T/S.
52
+
53
  <B>Quants - "EDGE of REASON":</B>
54
 
55
  Generally higher quants will solve problems faster with less tokens, and be able to solve tougher problems.
 
92
 
93
  Does support other languages besides English.
94
 
 
 
95
  ---