DavidAU commited on
Commit
3b94a48
·
verified ·
1 Parent(s): 3a5d8e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -7
README.md CHANGED
@@ -26,18 +26,18 @@ pipeline_tag: text-generation
26
 
27
  UPDATE:
28
 
29
- Re-optimizing quants, found a better mixture. Uploading shortly.
30
 
31
  Mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized
32
  may not be able to solve or take a lot longer (a lot more tokens!).
33
 
34
- Quick testing shows optimized can:
35
- - Answer/solve at lower quant.
36
- - Come up with a better answer/stronger reasoning
37
  - Use less tokens to "reason" ... up to 50% less.
38
  - Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
39
 
40
- Quants - "EDGE of REASON":
41
 
42
  IQ1_M - Works, but limited reasoning (reasoning operates, but has a tough time (if at all) coming up with the right answer for some problems).
43
 
@@ -55,9 +55,12 @@ For TOP performance, Q6/Q8.
55
 
56
  ...
57
 
58
- All quants have be optimized with:
59
 
60
- - NEO Imatrix Dataset
 
 
 
61
  - BF16 Output tensor (full precision)
62
 
63
  I found this config worked best with this specific model and "reasoning" in general.
 
26
 
27
  UPDATE:
28
 
29
+ Re-optimizing quants, found a better mixture. Uploading NOW...
30
 
31
  Mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized
32
  may not be able to solve or take a lot longer (a lot more tokens!).
33
 
34
+ Quick testing shows optimized quants can:
35
+ - Answer/solve at lower quant // solve whereas "reg quant" could not.
36
+ - Come up with a better answer/stronger reasoning.
37
  - Use less tokens to "reason" ... up to 50% less.
38
  - Faster and smaller quant size (VS "MAX" with output tensor and embed at BF16).
39
 
40
+ <B>Quants - "EDGE of REASON":</B>
41
 
42
  IQ1_M - Works, but limited reasoning (reasoning operates, but has a tough time (if at all) coming up with the right answer for some problems).
43
 
 
55
 
56
  ...
57
 
58
+ All quants (IQ1 right to Q6) have be optimized with:
59
 
60
+ - NEO Imatrix Dataset.
61
+ - BF16 Output tensor (full precision)
62
+
63
+ Q8 (imatrix has no effect on Q8):
64
  - BF16 Output tensor (full precision)
65
 
66
  I found this config worked best with this specific model and "reasoning" in general.