Text Generation
GGUF
English
NEO Imatrix
MAX Quants
GGUF
uncensored
reasoning
thinking
r1
cot
reka-flash
deepseek
Qwen2.5
Hermes
DeepHermes
DeepSeek
DeepSeek-R1-Distill
128k context
instruct
all use cases
maxed quants
Neo Imatrix
finetune
chatml
gpt4
synthetic data
distillation
function calling
roleplaying
chat
Uncensored
creative
general usage
problem solving
brainstorming
solve riddles
fiction writing
plot generation
sub-plot generation
story generation
scene continue
storytelling
fiction story
story
writing
fiction
swearing
horror
imatrix
conversational
File size: 2,518 Bytes
1f6f710 ec2d059 1f6f710 e0c4b4d 1f6f710 bbba23a 60e7503 e0c4b4d 1ce9822 e0c4b4d 60e7503 3b94a48 d591380 3a5d8e0 5b3eac4 3a5d8e0 e0c4b4d 3b94a48 3a5d8e0 bbba23a e6eec2c 3a5d8e0 bbba23a 3a5d8e0 3b94a48 3a5d8e0 3b94a48 3a5d8e0 d591380 60e7503 3a5d8e0 1f6f710 ec2d059 e6eec2c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
---
license: apache-2.0
language:
- en
tags:
- NEO Imatrix
- MAX Quants
- GGUF
- reasoning
- thinking
- r1
- cot
- reka-flash
- deepseek
- Qwen2.5
- Hermes
- DeepHermes
- DeepSeek
- DeepSeek-R1-Distill
- 128k context
base_model:
- RekaAI/reka-flash-3
pipeline_tag: text-generation
---
(Quants uploading, examples/repo card updates pending...)
<h2>Reka-Flash-3-21B-Reasoning-MAX-NEO-Imatrix-GGUF</h2>
UPDATE: Re-optimizing quants, found a better mixture. Uploading NOW...
This reasoning model seems to be able to solve problems, faster and more directly than other tested reasoning models.
It also rarely gets stuck in a loop or "lost in the woods."
This model is also unusually strong even at the smallest quant levels, and with augmentation now even stronger.
<B>Augmented Quants:</b>
Augmented quants mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized
may not be able to solve or take a lot longer (a lot more tokens!).
Quick testing shows optimized quants can:
- Answer/solve at lower quant // solve whereas "reg quant" could not.
- Come up with a better answer/stronger reasoning.
- Use less tokens to "reason" ... up to 50% less.
- Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
- Solution quality is also higher.
Cost of the Augment:
- Quants are are slightly larger.
- Very small "hit" in T/S.
<B>Quants - "EDGE of REASON":</B>
Generally higher quants will solve problems faster with less tokens, and be able to solve tougher problems.
Likewise "solutions" will be of higher detail too.
...
IQ1_M - Works, but limited reasoning (reasoning operates, but has a tough time (if at all) coming up with the right answer
for some problems). This is not the fault of the model, hardly any model operates at IQ1_M unless it is 35B+ in size
with very exceptions.
...
IQ2_S - Moderate reasoning ; impressive performance for both reasoning AND this quant level.
...
For best performance IQ3_M or IQ4_XS/NL or Q4s.
...
For TOP performance, Q6/Q8.
...
All quants (IQ1 right to Q6) have be optimized with:
- NEO Imatrix Dataset.
- BF16 Output tensor (full precision)
Q8 (imatrix has no effect on Q8):
- BF16 Output tensor (full precision)
I found this config worked best with this specific model and "reasoning" in general.
---
Reka's excellent reasoning model with MAX (level 1) quants, and NEO Imatrix dataset.
128k context.
Does support other languages besides English.
---
|