Text Generation
GGUF
English
NEO Imatrix
MAX Quants
GGUF
uncensored
reasoning
thinking
r1
cot
reka-flash
deepseek
Qwen2.5
Hermes
DeepHermes
DeepSeek
DeepSeek-R1-Distill
128k context
instruct
all use cases
maxed quants
Neo Imatrix
finetune
chatml
gpt4
synthetic data
distillation
function calling
roleplaying
chat
Uncensored
creative
general usage
problem solving
brainstorming
solve riddles
fiction writing
plot generation
sub-plot generation
story generation
scene continue
storytelling
fiction story
story
writing
fiction
swearing
horror
imatrix
conversational
Update README.md
Browse files
README.md
CHANGED
@@ -22,11 +22,22 @@ base_model:
|
|
22 |
- RekaAI/reka-flash-3
|
23 |
pipeline_tag: text-generation
|
24 |
---
|
|
|
|
|
|
|
25 |
<h2>Reka-Flash-3-21B-Reasoning-MAX-NEO-Imatrix-GGUF</h2>
|
26 |
|
27 |
UPDATE: Re-optimizing quants, found a better mixture. Uploading NOW...
|
28 |
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
may not be able to solve or take a lot longer (a lot more tokens!).
|
31 |
|
32 |
Quick testing shows optimized quants can:
|
@@ -35,6 +46,10 @@ Quick testing shows optimized quants can:
|
|
35 |
- Use less tokens to "reason" ... up to 50% less.
|
36 |
- Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
|
37 |
|
|
|
|
|
|
|
|
|
38 |
<B>Quants - "EDGE of REASON":</B>
|
39 |
|
40 |
Generally higher quants will solve problems faster with less tokens, and be able to solve tougher problems.
|
@@ -77,6 +92,4 @@ Reka's excellent reasoning model with MAX (level 1) quants, and NEO Imatrix data
|
|
77 |
|
78 |
Does support other languages besides English.
|
79 |
|
80 |
-
Quants uploading, examples/repo card updates pending...
|
81 |
-
|
82 |
---
|
|
|
22 |
- RekaAI/reka-flash-3
|
23 |
pipeline_tag: text-generation
|
24 |
---
|
25 |
+
|
26 |
+
(Quants uploading, examples/repo card updates pending...)
|
27 |
+
|
28 |
<h2>Reka-Flash-3-21B-Reasoning-MAX-NEO-Imatrix-GGUF</h2>
|
29 |
|
30 |
UPDATE: Re-optimizing quants, found a better mixture. Uploading NOW...
|
31 |
|
32 |
+
This reasoning model seems to be able to solve problems, faster and more directly than other tested reasoning models.
|
33 |
+
|
34 |
+
It also rarely, get stuck in a loop or "lost in the woods."
|
35 |
+
|
36 |
+
This model is also unusually strong even at the smallest quant levels, and with augmentation now even stronger.
|
37 |
+
|
38 |
+
<B>Augmented Quants:</b>
|
39 |
+
|
40 |
+
Augmented quants mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized
|
41 |
may not be able to solve or take a lot longer (a lot more tokens!).
|
42 |
|
43 |
Quick testing shows optimized quants can:
|
|
|
46 |
- Use less tokens to "reason" ... up to 50% less.
|
47 |
- Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
|
48 |
|
49 |
+
Cost of the Augment:
|
50 |
+
- Quants are are slightly larger.
|
51 |
+
- Very small "hit" in T/S.
|
52 |
+
|
53 |
<B>Quants - "EDGE of REASON":</B>
|
54 |
|
55 |
Generally higher quants will solve problems faster with less tokens, and be able to solve tougher problems.
|
|
|
92 |
|
93 |
Does support other languages besides English.
|
94 |
|
|
|
|
|
95 |
---
|