DavidAU
/

Reka-Flash-3-21B-Reasoning-Uncensored-MAX-NEO-Imatrix-GGUF

Text Generation

DeepSeek-R1-Distill

function calling

problem solving

fiction writing

plot generation

sub-plot generation

story generation

Model card Files Files and versions Community

DavidAU commited on Mar 20

Commit

e0c4b4d

·

verified ·

1 Parent(s): bbba23a

Update README.md

Files changed (1) hide show

README.md +16 -3

README.md CHANGED Viewed

@@ -22,11 +22,22 @@ base_model:
 - RekaAI/reka-flash-3
 pipeline_tag: text-generation
 ---
 <h2>Reka-Flash-3-21B-Reasoning-MAX-NEO-Imatrix-GGUF</h2>
 UPDATE: Re-optimizing quants, found a better mixture. Uploading NOW...
-Mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized
 may not be able to solve or take a lot longer (a lot more tokens!).
 Quick testing shows optimized quants can:
@@ -35,6 +46,10 @@ Quick testing shows optimized quants can:
   - Use less tokens to "reason" ... up to 50% less.
   - Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
 <B>Quants - "EDGE of REASON":</B>
 Generally higher quants will solve problems faster with less tokens, and be able to solve tougher problems.
@@ -77,6 +92,4 @@ Reka's excellent reasoning model with MAX (level 1) quants, and NEO Imatrix data
 Does support other languages besides English.
-Quants uploading, examples/repo card updates pending...
 ---

 - RekaAI/reka-flash-3
 pipeline_tag: text-generation
 ---
+(Quants uploading, examples/repo card updates pending...)
 <h2>Reka-Flash-3-21B-Reasoning-MAX-NEO-Imatrix-GGUF</h2>
 UPDATE: Re-optimizing quants, found a better mixture. Uploading NOW...
+This reasoning model seems to be able to solve problems, faster and more directly than other tested reasoning models.
+It also rarely, get stuck in a loop or "lost in the woods."
+This model is also unusually strong even at the smallest quant levels, and with augmentation now even stronger.
+<B>Augmented Quants:</b>
+Augmented quants mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized
 may not be able to solve or take a lot longer (a lot more tokens!).
 Quick testing shows optimized quants can:
   - Use less tokens to "reason" ... up to 50% less.
   - Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
+Cost of the Augment:
+  - Quants are are slightly larger.
+  - Very small "hit" in T/S.
 <B>Quants - "EDGE of REASON":</B>
 Generally higher quants will solve problems faster with less tokens, and be able to solve tougher problems.
 Does support other languages besides English.
 ---