Text Generation
GGUF
English
NEO Imatrix
MAX Quants
GGUF
uncensored
reasoning
thinking
r1
cot
reka-flash
deepseek
Qwen2.5
Hermes
DeepHermes
DeepSeek
DeepSeek-R1-Distill
128k context
instruct
all use cases
maxed quants
Neo Imatrix
finetune
chatml
gpt4
synthetic data
distillation
function calling
roleplaying
chat
Uncensored
creative
general usage
problem solving
brainstorming
solve riddles
fiction writing
plot generation
sub-plot generation
story generation
scene continue
storytelling
fiction story
story
writing
fiction
swearing
horror
imatrix
conversational
Update README.md
Browse files
README.md
CHANGED
@@ -26,18 +26,18 @@ pipeline_tag: text-generation
|
|
26 |
|
27 |
UPDATE:
|
28 |
|
29 |
-
Re-optimizing quants, found a better mixture. Uploading
|
30 |
|
31 |
Mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized
|
32 |
may not be able to solve or take a lot longer (a lot more tokens!).
|
33 |
|
34 |
-
Quick testing shows optimized can:
|
35 |
-
- Answer/solve at lower quant.
|
36 |
-
- Come up with a better answer/stronger reasoning
|
37 |
- Use less tokens to "reason" ... up to 50% less.
|
38 |
- Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
|
39 |
|
40 |
-
Quants - "EDGE of REASON"
|
41 |
|
42 |
IQ1_M - Works, but limited reasoning (reasoning operates, but has a tough time (if at all) coming up with the right answer for some problems).
|
43 |
|
@@ -55,9 +55,12 @@ For TOP performance, Q6/Q8.
|
|
55 |
|
56 |
...
|
57 |
|
58 |
-
All quants have be optimized with:
|
59 |
|
60 |
-
- NEO Imatrix Dataset
|
|
|
|
|
|
|
61 |
- BF16 Output tensor (full precision)
|
62 |
|
63 |
I found this config worked best with this specific model and "reasoning" in general.
|
|
|
26 |
|
27 |
UPDATE:
|
28 |
|
29 |
+
Re-optimizing quants, found a better mixture. Uploading NOW...
|
30 |
|
31 |
Mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized
|
32 |
may not be able to solve or take a lot longer (a lot more tokens!).
|
33 |
|
34 |
+
Quick testing shows optimized quants can:
|
35 |
+
- Answer/solve at lower quant // solve whereas "reg quant" could not.
|
36 |
+
- Come up with a better answer/stronger reasoning.
|
37 |
- Use less tokens to "reason" ... up to 50% less.
|
38 |
- Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
|
39 |
|
40 |
+
<B>Quants - "EDGE of REASON":</B>
|
41 |
|
42 |
IQ1_M - Works, but limited reasoning (reasoning operates, but has a tough time (if at all) coming up with the right answer for some problems).
|
43 |
|
|
|
55 |
|
56 |
...
|
57 |
|
58 |
+
All quants (IQ1 right to Q6) have be optimized with:
|
59 |
|
60 |
+
- NEO Imatrix Dataset.
|
61 |
+
- BF16 Output tensor (full precision)
|
62 |
+
|
63 |
+
Q8 (imatrix has no effect on Q8):
|
64 |
- BF16 Output tensor (full precision)
|
65 |
|
66 |
I found this config worked best with this specific model and "reasoning" in general.
|