DavidAU's picture
Update README.md
1ce9822 verified
|
raw
history blame
2.52 kB
metadata
license: apache-2.0
language:
  - en
tags:
  - NEO Imatrix
  - MAX Quants
  - GGUF
  - reasoning
  - thinking
  - r1
  - cot
  - reka-flash
  - deepseek
  - Qwen2.5
  - Hermes
  - DeepHermes
  - DeepSeek
  - DeepSeek-R1-Distill
  - 128k context
base_model:
  - RekaAI/reka-flash-3
pipeline_tag: text-generation

(Quants uploading, examples/repo card updates pending...)

Reka-Flash-3-21B-Reasoning-MAX-NEO-Imatrix-GGUF

UPDATE: Re-optimizing quants, found a better mixture. Uploading NOW...

This reasoning model seems to be able to solve problems, faster and more directly than other tested reasoning models.

It also rarely gets stuck in a loop or "lost in the woods."

This model is also unusually strong even at the smallest quant levels, and with augmentation now even stronger.

Augmented Quants:

Augmented quants mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized may not be able to solve or take a lot longer (a lot more tokens!).

Quick testing shows optimized quants can:

  • Answer/solve at lower quant // solve whereas "reg quant" could not.
  • Come up with a better answer/stronger reasoning.
  • Use less tokens to "reason" ... up to 50% less.
  • Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
  • Solution quality is also higher.

Cost of the Augment:

  • Quants are are slightly larger.
  • Very small "hit" in T/S.

Quants - "EDGE of REASON":

Generally higher quants will solve problems faster with less tokens, and be able to solve tougher problems.

Likewise "solutions" will be of higher detail too.

...

IQ1_M - Works, but limited reasoning (reasoning operates, but has a tough time (if at all) coming up with the right answer for some problems). This is not the fault of the model, hardly any model operates at IQ1_M unless it is 35B+ in size with very exceptions.

...

IQ2_S - Moderate reasoning ; impressive performance for both reasoning AND this quant level.

...

For best performance IQ3_M or IQ4_XS/NL or Q4s.

...

For TOP performance, Q6/Q8.

...

All quants (IQ1 right to Q6) have be optimized with:

  • NEO Imatrix Dataset.
  • BF16 Output tensor (full precision)

Q8 (imatrix has no effect on Q8):

  • BF16 Output tensor (full precision)

I found this config worked best with this specific model and "reasoning" in general.


Reka's excellent reasoning model with MAX (level 1) quants, and NEO Imatrix dataset.

128k context.

Does support other languages besides English.