--- license: apache-2.0 language: - en tags: - NEO Imatrix - MAX Quants - GGUF - reasoning - thinking - r1 - cot - reka-flash - deepseek - Qwen2.5 - Hermes - DeepHermes - DeepSeek - DeepSeek-R1-Distill - 128k context base_model: - RekaAI/reka-flash-3 pipeline_tag: text-generation --- (Quants uploading, examples/repo card updates pending...)

Reka-Flash-3-21B-Reasoning-MAX-NEO-Imatrix-GGUF

UPDATE: Re-optimizing quants, found a better mixture. Uploading NOW... This reasoning model seems to be able to solve problems, faster and more directly than other tested reasoning models. It also rarely, get stuck in a loop or "lost in the woods." This model is also unusually strong even at the smallest quant levels, and with augmentation now even stronger. Augmented Quants: Augmented quants mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized may not be able to solve or take a lot longer (a lot more tokens!). Quick testing shows optimized quants can: - Answer/solve at lower quant // solve whereas "reg quant" could not. - Come up with a better answer/stronger reasoning. - Use less tokens to "reason" ... up to 50% less. - Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16). Cost of the Augment: - Quants are are slightly larger. - Very small "hit" in T/S. Quants - "EDGE of REASON": Generally higher quants will solve problems faster with less tokens, and be able to solve tougher problems. Likewise "solutions" will be of higher detail too. ... IQ1_M - Works, but limited reasoning (reasoning operates, but has a tough time (if at all) coming up with the right answer for some problems). ... IQ2_S - Moderate reasoning ; impressive performance for both reasoning AND this quant level. ... For best performance IQ3_M or IQ4_XS/NL or Q4s. ... For TOP performance, Q6/Q8. ... All quants (IQ1 right to Q6) have be optimized with: - NEO Imatrix Dataset. - BF16 Output tensor (full precision) Q8 (imatrix has no effect on Q8): - BF16 Output tensor (full precision) I found this config worked best with this specific model and "reasoning" in general. --- Reka's excellent reasoning model with MAX (level 1) quants, and NEO Imatrix dataset. 128k context. Does support other languages besides English. ---