---
license: apache-2.0
language:
- en
tags:
- NEO Imatrix
- MAX Quants
- GGUF
- reasoning
- thinking
- r1
- cot
- reka-flash
- deepseek
- Qwen2.5
- Hermes
- DeepHermes
- DeepSeek
- DeepSeek-R1-Distill
- 128k context
base_model:
- RekaAI/reka-flash-3
pipeline_tag: text-generation
---

(Quants uploading, examples/repo card updates pending...)

<h2>Reka-Flash-3-21B-Reasoning-MAX-NEO-Imatrix-GGUF</h2>

UPDATE: Re-optimizing quants, found a better mixture. Uploading NOW... 

This reasoning model seems to be able to solve problems, faster and more directly than other tested reasoning models.

It also rarely gets stuck in a loop or "lost in the woods."

This model is also unusually strong even at the smallest quant levels, and with augmentation now even stronger.

<B>Augmented Quants:</b>

Augmented quants mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized 
may not be able to solve or take a lot longer (a lot more tokens!).

Quick testing shows optimized quants can:
  - Answer/solve at lower quant // solve whereas "reg quant" could not.
  - Come up with a better answer/stronger reasoning.
  - Use less tokens to "reason" ... up to 50% less.
  - Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
  - Solution quality is also higher.

Cost of the Augment:
  - Quants are are slightly larger.
  - Very small "hit" in T/S. 

<B>Quants - "EDGE of REASON":</B>

Generally higher quants will solve problems faster with less tokens, and be able to solve tougher problems.

Likewise "solutions" will be of higher detail too.

...

IQ1_M - Works, but limited reasoning (reasoning operates, but has a tough time (if at all) coming up with the right answer 
for some problems). This is not the fault of the model, hardly any model operates at IQ1_M unless it is 35B+ in size
with very exceptions.

...

IQ2_S - Moderate reasoning ; impressive performance for both reasoning AND this quant level.

...

For best performance IQ3_M or IQ4_XS/NL or Q4s.

...

For TOP performance, Q6/Q8.

...

All quants (IQ1 right to Q6) have be optimized with:

- NEO Imatrix Dataset. 
- BF16 Output tensor (full precision)

Q8 (imatrix has no effect on Q8):
- BF16 Output tensor (full precision)

I found this config worked best with this specific model and "reasoning" in general.

---

Reka's excellent reasoning model with MAX (level 1) quants, and NEO Imatrix dataset.

128k context.

Does support other languages besides English.

---