---
license: apache-2.0
language:
- en
tags:
- NEO Imatrix
- MAX Quants
- GGUF
- reasoning
- thinking
- r1
- cot
- reka-flash
- deepseek
- Qwen2.5
- Hermes
- DeepHermes
- DeepSeek
- DeepSeek-R1-Distill
- 128k context
base_model:
- RekaAI/reka-flash-3
pipeline_tag: text-generation
---
(Quants uploading, examples/repo card updates pending...)
Reka-Flash-3-21B-Reasoning-MAX-NEO-Imatrix-GGUF
UPDATE: Re-optimizing quants, found a better mixture. Uploading NOW...
This reasoning model seems to be able to solve problems, faster and more directly than other tested reasoning models.
It also rarely, get stuck in a loop or "lost in the woods."
This model is also unusually strong even at the smallest quant levels, and with augmentation now even stronger.
Augmented Quants:
Augmented quants mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized
may not be able to solve or take a lot longer (a lot more tokens!).
Quick testing shows optimized quants can:
- Answer/solve at lower quant // solve whereas "reg quant" could not.
- Come up with a better answer/stronger reasoning.
- Use less tokens to "reason" ... up to 50% less.
- Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
Cost of the Augment:
- Quants are are slightly larger.
- Very small "hit" in T/S.
Quants - "EDGE of REASON":
Generally higher quants will solve problems faster with less tokens, and be able to solve tougher problems.
Likewise "solutions" will be of higher detail too.
...
IQ1_M - Works, but limited reasoning (reasoning operates, but has a tough time (if at all) coming up with the right answer for some problems).
...
IQ2_S - Moderate reasoning ; impressive performance for both reasoning AND this quant level.
...
For best performance IQ3_M or IQ4_XS/NL or Q4s.
...
For TOP performance, Q6/Q8.
...
All quants (IQ1 right to Q6) have be optimized with:
- NEO Imatrix Dataset.
- BF16 Output tensor (full precision)
Q8 (imatrix has no effect on Q8):
- BF16 Output tensor (full precision)
I found this config worked best with this specific model and "reasoning" in general.
---
Reka's excellent reasoning model with MAX (level 1) quants, and NEO Imatrix dataset.
128k context.
Does support other languages besides English.
---