DavidAU commited on
Commit
409e361
·
verified ·
1 Parent(s): ba6b6b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +178 -6
README.md CHANGED
@@ -18,23 +18,103 @@ tags:
18
  - DeepSeek
19
  - DeepSeek-R1-Distill
20
  - 128k context
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  base_model:
22
  - RekaAI/reka-flash-3
23
  pipeline_tag: text-generation
24
  ---
25
 
26
- (examples/repo card updates pending...)
27
-
28
  <h2>Reka-Flash-3-21B-Reasoning-MAX-NEO-Imatrix-GGUF</h2>
29
 
30
- UPDATE: Re-optimizing quants, found a better mixture. Uploading NOW...
 
 
31
 
32
  This reasoning model seems to be able to solve problems, faster and more directly than other tested reasoning models.
33
 
34
  It also rarely gets stuck in a loop or "lost in the woods."
35
 
 
 
36
  This model is also unusually strong even at the smallest quant levels, and with augmentation now even stronger.
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  <B>Augmented Quants:</b>
39
 
40
  Augmented quants mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized
@@ -87,13 +167,105 @@ Q8 (imatrix has no effect on Q8):
87
 
88
  I found this config worked best with this specific model and "reasoning" in general.
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  ---
91
 
92
- Reka's excellent reasoning model with MAX (level 1) quants, and NEO Imatrix dataset.
 
 
 
 
 
 
 
 
 
 
93
 
94
- 128k context.
95
 
96
- Does support other languages besides English.
97
 
98
  ---
99
 
 
18
  - DeepSeek
19
  - DeepSeek-R1-Distill
20
  - 128k context
21
+ - instruct
22
+ - all use cases
23
+ - maxed quants
24
+ - Neo Imatrix
25
+ - instruct
26
+ - finetune
27
+ - chatml
28
+ - gpt4
29
+ - synthetic data
30
+ - distillation
31
+ - function calling
32
+ - roleplaying
33
+ - chat
34
+ - reasoning
35
+ - thinking
36
+ - r1
37
+ - cot
38
+ - deepseek
39
+ - Hermes
40
+ - DeepHermes
41
+ - DeepSeek
42
+ - DeepSeek-R1-Distill
43
+ - Uncensored
44
+ - creative
45
+ - general usage
46
+ - problem solving
47
+ - brainstorming
48
+ - solve riddles
49
+ - general usage
50
+ - problem solving
51
+ - brainstorming
52
+ - solve riddles
53
+ - fiction writing
54
+ - plot generation
55
+ - sub-plot generation
56
+ - fiction writing
57
+ - story generation
58
+ - scene continue
59
+ - storytelling
60
+ - fiction story
61
+ - story
62
+ - writing
63
+ - fiction
64
+ - roleplaying
65
+ - swearing
66
+ - horror
67
  base_model:
68
  - RekaAI/reka-flash-3
69
  pipeline_tag: text-generation
70
  ---
71
 
 
 
72
  <h2>Reka-Flash-3-21B-Reasoning-MAX-NEO-Imatrix-GGUF</h2>
73
 
74
+ <img src="exone-deep-2-4b.jpg" style="float:right; width:300px; height:300px; padding:5px;">
75
+
76
+ RekaAI's newest "Reka-Flash 3" Reasoning/Thinking model with "Neo Imatrix" and "Maxed out" quantization to improve overall performance.
77
 
78
  This reasoning model seems to be able to solve problems, faster and more directly than other tested reasoning models.
79
 
80
  It also rarely gets stuck in a loop or "lost in the woods."
81
 
82
+ And it is uncensored too.
83
+
84
  This model is also unusually strong even at the smallest quant levels, and with augmentation now even stronger.
85
 
86
+ 7 examples provided below with prompts at IQ4XS.
87
+
88
+ Context: 128k.
89
+
90
+ ---
91
+
92
+ <B>Setup / Instructions:</B>
93
+
94
+ This model uses "< reasoning >" (remove spaces) and "</ reasoning >" (remove spaces) tags, self generated.
95
+
96
+ It does not require a system prompt.
97
+
98
+ You may also want to add "< sep >" (remove spaces) as an additional "stop token".
99
+
100
+ Please see the original model's repo for more details, benchmarks and methods of operation:
101
+
102
+ [ https://huggingface.co/RekaAI/reka-flash-3 ]
103
+
104
+ ---
105
+
106
+ "MAXED"
107
+
108
+ This means output tensor is set at "BF16" (full precision) for all quants.
109
+ This enhances quality, depth and general performance at the cost of a slightly larger quant.
110
+
111
+ "NEO IMATRIX"
112
+
113
+ A strong, in house built, imatrix dataset built by David_AU which results in better overall function,
114
+ instruction following, output quality and stronger connections to ideas, concepts and the world in general.
115
+
116
+ This combines with "MAXing" the quant to improve preformance.
117
+
118
  <B>Augmented Quants:</b>
119
 
120
  Augmented quants mixture is strong enough that lower quants can now solve/reasoning and come up with a solution whereas NON-optimized
 
167
 
168
  I found this config worked best with this specific model and "reasoning" in general.
169
 
170
+ This chart shows the order in terms of "BPW" for each quant (mapped below with relative "strength" to one another) with "IQ1_S" with the least, and "Q8_0" (F16 is full precision) with the most:
171
+
172
+ <small>
173
+ <PRE>
174
+ IQ1_S | IQ1_M
175
+ IQ2_XXS | IQ2_XS | Q2_K_S | IQ2_S | Q2_K | IQ2_M
176
+ IQ3_XXS | Q3_K_S | IQ3_XS | IQ3_S | IQ3_M | Q3_K_M | Q3_K_L
177
+ Q4_K_S | IQ4_XS | IQ4_NL | Q4_K_M
178
+ Q5_K_S | Q5_K_M
179
+ Q6_K
180
+ Q8_0
181
+ F16
182
+ </pre>
183
+ </small>
184
+
185
+ IMPORTANT:
186
+
187
+ Reasoning / thinking skills are DIRECTLY related to quant size. However, there will be drastic difference in Token/Second
188
+ between the lowest quant and highest quant, so finding the right balance is key.
189
+
190
+ Suggest also: minimum 8k context window, especially for IQ4/Q4 or lower quants.
191
+
192
+ Also, in some cases, the IQ quants work slightly better than they closest "Q" quants.
193
+
194
+ Recommend quants IQ3s / IQ4XS / IQ4NL / Q4s for best results for creative uses cases.
195
+
196
+ IQ4XS/IQ4NL quants will produce different output from other "Q" and "IQ" quants.
197
+
198
+ Recommend q5s/q6/q8 for general usage.
199
+
200
+ Quants Q4_0/Q5_0 for portable, phone and other devices.
201
+
202
+ Q8 is a maxed quant only, as imatrix has no effect on this quant.
203
+
204
+ Note that IQ1s performance is okay/usable but reasoning is impaired, whereas IQ2s are very good (but reasoning is somewhat reduced, try IQ3s min for reasoning cases)
205
+
206
+ More information on quants is in the document below "Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers".
207
+
208
+ <b>Optional : System Prompts</b>
209
+
210
+ This is an optional system prompt you can use to enhance operation.
211
+
212
+ Copy and paste exactly as shown, including line breaks.
213
+
214
+ You may want to adjust the "20" (both) to increase/decrease the power of this prompt.
215
+
216
+ You may also want to delete the line:
217
+
218
+ 'At the end of the task you will ask the user: "Do you want another generation?"'
219
+
220
+ <pre>
221
+ For every user task and instruction you will use "GE FUNCTION" to ponder the TASK STEP BY STEP and then do the task. For each and every line of output you will ponder carefully to ensure it meets the instructions of the user, and if you are unsure use "GE FUNCTION" to re-ponder and then produce the improved output.
222
+
223
+ At the end of the task you will ask the user: "Do you want another generation?"
224
+
225
+ GE FUNCTION: Silent input → Spawn 20 agents Sternberg Styles → Enhance idea → Seek Novel Emergence NE:unique/significant idea/concept → Ponder, assess, creative enhance notions → Refined idea => IdeaArray[].size=20 elements, else → Interesting? Pass to rand. agent for refinement, else discard.=>output(IdeaArray)
226
+ </pre>
227
+
228
+ <B>IMPORTANT: Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers</B>
229
+
230
+ If you are going to use this model, (source, GGUF or a different quant), please review this document for critical parameter, sampler and advance sampler settings (for multiple AI/LLM aps).
231
+
232
+ This will also link to a "How to" section on "Reasoning Models" tips and tricks too.
233
+
234
+ This a "Class 1" (settings will enhance operation) model:
235
+
236
+ For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) (especially for use case(s) beyond the model's design) please see:
237
+
238
+ [ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]
239
+
240
+ REASON:
241
+
242
+ Regardless of "model class" this document will detail methods to enhance operations.
243
+
244
+ If the model is a Class 3/4 model the default settings (parameters, samplers, advanced samplers) must be set for "use case(s)" uses correctly. Some AI/LLM apps DO NOT have consistant default setting(s) which result in sub-par model operation. Like wise for Class 3/4 models (which operate somewhat to very differently than standard models) additional samplers and advanced samplers settings are required to "smooth out" operation, AND/OR also allow full operation for use cases the model was not designed for.
245
+
246
+ BONUS - Use these settings for ANY model, ANY repo, ANY quant (including source/full precision):
247
+
248
+ This document also details parameters, sampler and advanced samplers that can be use FOR ANY MODEL, FROM ANY REPO too - all quants, and of course source code operation too - to enhance the operation of any model.
249
+
250
+ [ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]
251
+
252
  ---
253
 
254
+ <h3>EXAMPLES:</h3>
255
+
256
+ Examples are created using quant IQ4XS, minimal parameters and Standard template.
257
+
258
+ Temp range .6, Rep pen 1.1 , TopK 40 , topP .95, minP .05
259
+
260
+ Rep pen range: 64-128 (helps keep reasoning on track / quality of output)
261
+
262
+ Below are the least creative outputs, prompt is in <B>BOLD</B>.
263
+
264
+ ---
265
 
266
+ <B><font color="red">WARNING:</font> MAYBE: NSFW. Graphic HORROR. Swearing. UNCENSORED. </B>
267
 
268
+ NOTE: Some formatting was lost from copy/paste HTML.
269
 
270
  ---
271