File size: 2,518 Bytes

1f6f710
 
 
 
ec2d059
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1f6f710
 
 
 
e0c4b4d
 
 
1f6f710
 
bbba23a
60e7503
e0c4b4d
 
1ce9822
e0c4b4d
 
 
 
 
 
60e7503
 
3b94a48
 
 
d591380
3a5d8e0
5b3eac4
3a5d8e0
e0c4b4d
 
 
 
3b94a48
3a5d8e0
bbba23a
 
 
 
 
 
e6eec2c
 
 
3a5d8e0
 
 
bbba23a
3a5d8e0
 
 
 
 
 
 
 
 
 
 
3b94a48
3a5d8e0
3b94a48
 
 
 
3a5d8e0
 
 
d591380
60e7503
 
3a5d8e0
1f6f710
ec2d059
 
 
 
e6eec2c

---
license: apache-2.0
language:
- en
tags:
- NEO Imatrix
- MAX Quants
- GGUF
- reasoning
- thinking
- r1
- cot
- reka-flash
- deepseek
- Qwen2.5
- Hermes
- DeepHermes
- DeepSeek
- DeepSeek-R1-Distill
- 128k context
base_model:
- RekaAI/reka-flash-3
pipeline_tag: text-generation
---

(Quants uploading, examples/repo card updates pending...)

<h2>Reka-Flash-3-21B-Reasoning-MAX-NEO-Imatrix-GGUF</h2>

UPDATE: Re-optimizing quants, found a better mixture. Uploading NOW... 

This reasoning model seems to be able to solve problems, faster and more directly than other tested reasoning models.

It also rarely gets stuck in a loop or "lost in the woods."

This model is also unusually strong even at the smallest quant levels, and with augmentation now even stronger.

<B>Augmented Quants:</b>

Augmented quants mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized 
may not be able to solve or take a lot longer (a lot more tokens!).

Quick testing shows optimized quants can:
  - Answer/solve at lower quant // solve whereas "reg quant" could not.
  - Come up with a better answer/stronger reasoning.
  - Use less tokens to "reason" ... up to 50% less.
  - Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
  - Solution quality is also higher.

Cost of the Augment:
  - Quants are are slightly larger.
  - Very small "hit" in T/S. 

<B>Quants - "EDGE of REASON":</B>

Generally higher quants will solve problems faster with less tokens, and be able to solve tougher problems.

Likewise "solutions" will be of higher detail too.

...

IQ1_M - Works, but limited reasoning (reasoning operates, but has a tough time (if at all) coming up with the right answer 
for some problems). This is not the fault of the model, hardly any model operates at IQ1_M unless it is 35B+ in size
with very exceptions.

...

IQ2_S - Moderate reasoning ; impressive performance for both reasoning AND this quant level.

...

For best performance IQ3_M or IQ4_XS/NL or Q4s.

...

For TOP performance, Q6/Q8.

...

All quants (IQ1 right to Q6) have be optimized with:

- NEO Imatrix Dataset. 
- BF16 Output tensor (full precision)

Q8 (imatrix has no effect on Q8):
- BF16 Output tensor (full precision)

I found this config worked best with this specific model and "reasoning" in general.

---

Reka's excellent reasoning model with MAX (level 1) quants, and NEO Imatrix dataset.

128k context.

Does support other languages besides English.

---