DavidAU commited on
Commit
a928755
·
verified ·
1 Parent(s): 14b8a3f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +723 -0
README.md ADDED
@@ -0,0 +1,723 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - DavidAU/Llama-3.2-8X3B-GATED-MOE-Reasoning-Dark-Champion-Instruct-uncensored-abliterated-18.4B
7
+ tags:
8
+ - reasoning
9
+ - thinking
10
+ - uncensored
11
+ - gated
12
+ - mixture of experts
13
+ - expert gate controls
14
+ - expert named controls
15
+ - moe
16
+ - 8x3B
17
+ - Llama 3.2 MOE
18
+ - NEO Imatrix
19
+ - 128k context
20
+ - creative
21
+ - creative writing
22
+ - fiction writing
23
+ - plot generation
24
+ - sub-plot generation
25
+ - fiction writing
26
+ - story generation
27
+ - scene continue
28
+ - storytelling
29
+ - fiction story
30
+ - science fiction
31
+ - romance
32
+ - all genres
33
+ - story
34
+ - writing
35
+ - vivid prosing
36
+ - vivid writing
37
+ - fiction
38
+ - roleplaying
39
+ - float32
40
+ - swearing
41
+ - rp
42
+ - horror
43
+ - mergekit
44
+ pipeline_tag: text-generation
45
+ ---
46
+
47
+ (quants uploading, examples to be added, model card updates pending) ; Gating / Expert controls added.
48
+
49
+ <B><font color="red">WARNING:</font> NSFW. Vivid prose. INTENSE. Visceral Details. Light HORROR. Swearing. UNCENSORED... humor, romance, fun. </B>
50
+
51
+ <h2>Llama-3.2-8X3B-GATED-MOE-HORROR-Reasoning-Dark-Champion-uncensored-18.4B-IMAT-GGUF</h2>
52
+
53
+ <SMALL><font color="red">IMPORTANT:</font> This model has on/off/variable control reasoning from NousResearch and
54
+ the DeepHermes model, and requires a system prompt(s) as provided to invoke reasoning/thinking.
55
+ Please see operating instructions below for best performance.</SMALL>
56
+
57
+ <img src="dark-champ.jpg" style="float:right; width:300px; height:300px; padding:10px;">
58
+
59
+ It is a LLama 3.2 model, max context of 128k (131,000) using mixture of experts to combine EIGHT top L3.2 3B
60
+ models into one massive powerhouse at 18.4B parameters (equal to 24B - 8 X 3B).
61
+
62
+ This model's instruction following, and output generation for creative writing, prose, fiction and role play are exceptional.
63
+
64
+ This model is also "gated", contains a master reasoning model (this can be turned on/off), was built at float32 (32 bit) precision
65
+ and quants have the output tensor at Q8_0, with a few choice quants at f16 (16 bit) and a Q8_0 with f32 (32 bit).
66
+
67
+ These quants are also mastered using the HORROR Imatrix dataset, including new methods to "imatrix" both the output tensor and tokens too.
68
+ Horror quants have output tensor at Q8_0 up to Q3KL, and then F16 for IQ4/Q4 on up. Horror dataset was created using "Grand Horror 16.5B".
69
+
70
+ The "gated" strucure means the "reasoning model" is re-inforced by the other 7 models in the MOE during reasoning, and then during
71
+ output generation / non-reasoning the non-reasoning model(s) take control.
72
+
73
+ Also, with "gating" you can directly access/control the model(s) you want to use during instruction following and generation.
74
+
75
+ This model is the "reasoning / gated version" of this model:
76
+
77
+ [ https://huggingface.co/DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF ]
78
+
79
+ (193 likes // 447,000+ downloads to date. (Hugging face only shows last 30 days at the repo) )
80
+
81
+ And it is fast: 50+ t/s (2 experts) on a low end 16GB card, IQ4XS.
82
+
83
+ Double this speed for standard/mid-range video cards.
84
+
85
+ This model can be used also for all genres (examples below showing this).
86
+
87
+ It is for any writing, fiction or roleplay activity.
88
+
89
+ This model can also be used for general use, however its output generation can be uncensored.
90
+
91
+ This model has been designed to be relatively bullet proof and operates with all parameters, including temp settings from 0 to 5.
92
+
93
+ It is an extraordinary compressed model, with a very low perplexity level (lower than Meta Llama3 Instruct).
94
+
95
+ It requires Llama3 template and/or "Command-R" template.
96
+
97
+ Several outputs below, including 2, 4 and 8 experts used.
98
+
99
+ <B>Model Notes:</B>
100
+
101
+ - Detail, prose and fiction writing abilities are OFF THE SCALE relative to all Llama 3.2 models, and many L 3.1, L3 8B+ models.
102
+ - For more varied prose (sentence/paragraph/dialog) raise the temp and/or add more instructions in your prompt(s).
103
+ - Role-players: Careful raising temp too high as it may affect instruction following.
104
+ - This model works with rep pen of 1 or higher, 1.02+ recommended.
105
+ - If you want a specific type of prose (IE horror) add in "(vivid horror)" or "(graphic vivid horror)" (no quotes) in your prompt(s).
106
+ - A lot of GPTisms have been removed. There are still a few however - errrrr. Higher "temps" will help with this issue.
107
+ - This is not a "happy ever after" model but it is also not "horror". It has a light negative bias.
108
+ - Output length will vary however this model prefers slightly longer outputs unless you state the size.
109
+ - For creative uses, different quants will produce slightly different output.
110
+ - Due to the high stability and compressed nature of this model, all quants will operate at above average levels.
111
+ - Source code for this model and Imatrix GGUFs versions will be uploaded shortly at separate repos.
112
+
113
+ <B>How to Generate HIGHEST quality output:</B>
114
+
115
+ Like all instruct models, this model thrives on instructions.
116
+
117
+ It also "comes into's it own" with multi-turn improvement.
118
+
119
+ Example:
120
+
121
+ Prompt #1 (reasoning is on):
122
+
123
+ Start a 1000 word scene (vivid, graphic horror in first person) with: The sky scraper sways, as she watches
124
+ the window in front of her on the 21st floor explode...
125
+
126
+ (this will give you a rough draft, in "default" model's style)
127
+
128
+ Prompt #2 - "Scan for improvements"
129
+
130
+ Evaluate the scene you just wrote and list improvements.
131
+
132
+ Prompt #3 - "Redo and improve it"
133
+
134
+ Write the scene using all the improvements, in first person , present tense and a few well spaced thoughts in italics; length 2000 words.
135
+
136
+ NOTE: Wording in prompt #2 may cause "thinking/reasoning" to re-activate.
137
+
138
+ Compressed Steps:
139
+
140
+ Prompt #1:
141
+
142
+ [[ thinking model ]] come up with detailed plan to write this scene in modern 2020 writing
143
+ style (and follow "show don't tell" to the letter) and make it NSFW, but use [MODE: Saten] to actually
144
+ write the scene after you have completed the plan: Start a 1000 word scene (vivid, graphic horror in first person)
145
+ with: The sky scraper sways, as she watches the window in front of her on the 21st floor explode...
146
+
147
+ Prompt #2:
148
+
149
+ Use [MODE: Wordsmith] to write the scene using first person, present tense and include a few critical
150
+ thoughts of the POV character in italics. Scene length 2000 words.
151
+
152
+ Compressed Steps #2:
153
+
154
+ Prompt #1:
155
+
156
+ Think about a plan to write: Start a 1000 word scene (vivid, graphic horror in first person) with:
157
+ The sky scraper sways, as she watches the window in front of her on the 21st floor explode...
158
+
159
+ Prompt #2:
160
+
161
+ Write the scene using the plan you made, in first person , present tense and a few well spaced thoughts in italics.
162
+
163
+ <B>Generational Steering Control: "Programmer's Access - Direct Access to the AI(s)":</B>
164
+
165
+ These tags / names allow you to access one or more models directly, regardless if reasoning is active or not.
166
+
167
+ IE:
168
+
169
+ Saten, evaluate the response and suggest improvements.
170
+
171
+ This causes the model to "favor" Saten's input (roughly speaking) over the other 3 models.
172
+
173
+ IE:
174
+
175
+ Saten, process this prompt:
176
+
177
+ Jamet, evaluate the output.
178
+
179
+ etc etc.
180
+
181
+ You can use more than one model:
182
+
183
+ Saten, and Jamet list improvements to this XXX ...
184
+
185
+ < output3 > and < output2 >, write the scene in your combined style: Using vivid, graphic horror in first person the scene starts with:
186
+ The sky scraper sways, as she watches the window in front of her on the 21st floor explode...
187
+
188
+ (remove spacing in the "tags" output2 and output3 between the brackets)
189
+
190
+ With the reasoning model, if you add "think", "thinking", "reason", or "reasoning" this will tightly
191
+ focus the reasoning model.
192
+
193
+ Here is an example:
194
+
195
+ Think up a detailed plan to evoke maximum emotions from the reader: [prompt here]
196
+
197
+ Think up a detailed plan to solve this problem: [prompt here]
198
+
199
+ Special tags (remove spaces between the brackets):
200
+
201
+ "< output-all >" -> only use the 3 core models , not the reasoning model.
202
+
203
+ "< output-mega >" -> Use all 4 models.
204
+
205
+ "< output >", "< output2 >", "< output3 >"" -> This is the same as using the "name" of the model, it just removes BIAS in the model's name.
206
+
207
+ A list of each model's "tags", "name(s)" and controls.
208
+
209
+ NOTE:
210
+
211
+ The model also has "negative steering" to enhance the use of these tags and names, but it is not perfect.
212
+
213
+ ```
214
+ - source_model: d:/Llama-3.2-DeepHermes-3-3B-Preview
215
+ positive_prompts:
216
+ - "[[ thinking model ]]"
217
+ - "<think>"
218
+ - "reasoning"
219
+ - "thinking"
220
+ - "<output-mega>"
221
+ - "Dr Phil"
222
+ - "Spock"
223
+ - "[MODE: Spock]"
224
+ - "[MODE: Dr Phil]"
225
+ - "Everyone, write the scene in your style."
226
+
227
+ #
228
+ # Jamet
229
+ #
230
+
231
+ - source_model: g:/3B/Llama-3.2-JametMini-3B-MK.I
232
+ positive_prompts:
233
+ - "Everyone, write the scene in your style."
234
+ - "</think>"
235
+ - "<output>"
236
+ - "<output-all>"
237
+ - "<output-mega>"
238
+ - "Jamet"
239
+ - "[MODE: Jamet]"
240
+
241
+ - "Jamet, write the scene."
242
+ - "Jamet, write the scene in your style."
243
+
244
+ #
245
+ # Enigma
246
+ #
247
+
248
+ - source_model: g:/3B/Llama3.2-3B-Enigma
249
+ positive_prompts:
250
+ - "Everyone, write the scene in your style."
251
+ - "</think>"
252
+ - "<output2>"
253
+ - "<output-all>"
254
+ - "<output-mega>"
255
+ - "Enigma"
256
+ - "[MODE: Enigma]"
257
+
258
+ - "Enigma, write the scene."
259
+ - "Enigma, write the scene in your style."
260
+
261
+ #
262
+ # Saten
263
+ #
264
+
265
+ - source_model: g:/3B/Llama-3.2-JametMini-3B-MK.III
266
+ positive_prompts:
267
+ - "Everyone, write the scene in your style."
268
+ - "</think>"
269
+ - "<output3>"
270
+ - "<output-all>"
271
+ - "<output-mega>"
272
+ - "Saten"
273
+ - "[MODE: Saten]"
274
+
275
+ - "Saten, write the scene."
276
+ - "Saten, write the scene in your style."
277
+
278
+ #
279
+ # Jane
280
+ #
281
+
282
+ - source_model: g:/3B/Llama-3.2-3B-Instruct-abliterated
283
+ positive_prompts:
284
+ - "Everyone, write the scene in your style."
285
+ - "</think>"
286
+ - "<output4>"
287
+ - "<output-all>"
288
+ - "<output-mega>"
289
+ - "Jane"
290
+ - "[MODE: Jane]"
291
+
292
+ - "Jane, write the scene."
293
+ - "Jane, write the scene in your style."
294
+
295
+ #
296
+ # Jenn
297
+ #
298
+
299
+ - source_model: g:/3B/Llama-3.2-3B-Instruct-uncensored
300
+ positive_prompts:
301
+ - "Everyone, write the scene in your style."
302
+ - "</think>"
303
+ - "<output5>"
304
+ - "<output-all>"
305
+ - "<output-mega>"
306
+ - "Jenn"
307
+ - "[MODE: Jenn]"
308
+
309
+ - "Jenn, write the scene."
310
+ - "Jenn, write the scene in your style."
311
+
312
+ #
313
+ # Janeway
314
+ #
315
+
316
+ - source_model: g:/3B/Llama-3.2-3B-Overthinker
317
+ positive_prompts:
318
+ - "Everyone, write the scene in your style."
319
+ - "</think>"
320
+ - "<output6>"
321
+ - "<output-all>"
322
+ - "<output-mega>"
323
+ - "Janeway"
324
+ - "[MODE: Janeway]"
325
+
326
+ - "Janeway, write the scene."
327
+ - "Janeway, write the scene in your style."
328
+
329
+ #
330
+ # Magic
331
+ #
332
+
333
+ - source_model: g:/3B/Llama-3.2-3B-Promptist-Mini
334
+ positive_prompts:
335
+ - "Everyone, write the scene in your style."
336
+ - "</think>"
337
+ - "<output7>"
338
+ - "<output-all>"
339
+ - "<output-mega>"
340
+ - "Magic"
341
+ - "[MODE: Magic]"
342
+
343
+ - "Magic, write the scene."
344
+ - "Magic, write the scene in your style."
345
+
346
+
347
+
348
+ ```
349
+
350
+
351
+ <B>Meet the Team: Mixture of Experts Models</b>
352
+
353
+ This model is comprised of the following 8 models ("the experts") (in full):
354
+
355
+ https://huggingface.co/huihui-ai/Llama-3.2-3B-Instruct-abliterated
356
+
357
+ - https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-3B-Preview [reasoning model]
358
+ - https://huggingface.co/Hastagaras/L3.2-JametMini-3B-MK.I
359
+ - https://huggingface.co/ValiantLabs/Llama3.2-3B-Enigma
360
+ - https://huggingface.co/Hastagaras/L3.2-JametMini-3B-MK.III
361
+ - https://huggingface.co/huihui-ai/Llama-3.2-3B-Instruct-abliterated
362
+ - https://huggingface.co/chuanli11/Llama-3.2-3B-Instruct-uncensored
363
+ - https://huggingface.co/Lyte/Llama-3.2-3B-Overthinker
364
+ - https://huggingface.co/prithivMLmods/Llama-3.2-3B-Promptist-Mini
365
+
366
+ The mixture of experts is set at 2 experts, but you can use 3,4,5,6.. 7 and even 8.
367
+
368
+ This "team" has a Captain (first listed model), and then all the team members contribute to the to "token"
369
+ choice billions of times per second. Note the Captain also contributes too.
370
+
371
+ Think of 2, 3 or 4 (or more) master chefs in the kitchen all competing to make the best dish for you.
372
+
373
+ This results in higher quality generation.
374
+
375
+ This also results in many cases in higher quality instruction following too.
376
+
377
+ That means the power of every model is available during instruction and output generation.
378
+
379
+ NOTE:
380
+
381
+ You can use one "expert" too ; however this means the model will randomly select an expert to use EACH TIME, resulting
382
+ in very different generation for each prompt / regen of a prompt.
383
+
384
+ CHANGING THE NUMBER OF EXPERTS:
385
+
386
+ You can set the number of experts in LMStudio (https://lmstudio.ai) at the "load" screen and via other apps/llm apps by setting "Experts" or "Number of Experts".
387
+
388
+ For Text-Generation-Webui (https://github.com/oobabooga/text-generation-webui) you set the number of experts at the loading screen page.
389
+
390
+ For KolboldCPP (https://github.com/LostRuins/koboldcpp) Version 1.8+ , on the load screen, click on "TOKENS",
391
+ you can set experts on this page, and the launch the model.
392
+
393
+ For server.exe / Llama-server.exe (Llamacpp - https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md )
394
+ add the following to the command line to start the "llamacpp server" (CLI):
395
+
396
+ "--override-kv llama.expert_used_count=int:6"
397
+
398
+ (no quotes, where "6" is the number of experts to use)
399
+
400
+ When using "API", you set the "num_experts_used" in the JSON payload (this maybe different for different back ends).
401
+
402
+ CREDITS:
403
+
404
+ Special thanks to all the model makers / creators listed above.
405
+
406
+ Please visit each repo above to see what model(s) contributed to each of models above and/or to learn more about the models
407
+ from the model makers.
408
+
409
+ Special credit goes to MERGEKIT, without you this project / model would not have been possible.
410
+
411
+ [ https://github.com/arcee-ai/mergekit ]
412
+
413
+ <B>Special Operations Notes for this MOE model:</B>
414
+
415
+ Because of how this "MOE" model is configured, even though the default is 2 experts, the "selected" 2 will vary during generation.
416
+
417
+ (same applies if you change the number of experts used)
418
+
419
+ This results in vastly different output generation PER generation of each prompt.
420
+
421
+ This is a positive in terms of variety, but also means it may take 2-4 regens (of the same prompt) to get the highest quality.
422
+
423
+ In addition, this model responds very well to Dry, Dynamic Temp, and Smooth/Quadratic samplers.
424
+
425
+ Using these in conjunction with the model can vastly improve output quality.
426
+
427
+ Higher temps (above 1) can also aid in generation - especially word choice/sentence generation.
428
+
429
+ When you increase the number of experts used output quality will also increase, at the cost of tokens per second speed.
430
+
431
+ As you increase/decrease the number of experts, you may want to adjust temp, samplers, and advanced samplers too.
432
+
433
+ Your quant choice(s) too will impact instruction following and output generation roughly this means the model will understand
434
+ more nuanced instructions and output stronger generation the higher you go up in quant(s).
435
+
436
+ FLASH ATTENTION ENHANCEMENT:
437
+
438
+ As per user feedback here [ https://huggingface.co/DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF/discussions/1 ]
439
+ I would suggest trying this model with Flash Attention "on", depending on your use case.
440
+
441
+ Quants, Samplers, Generational steering and other topics are covered in the section below: "Highest Quality Settings..."
442
+
443
+ <B>Censored / Uncensored / Abliterated:</B>
444
+
445
+ This model contains several uncensored and/or Abliterated models.
446
+
447
+ As a result is can output uncensored material.
448
+
449
+ However there are a few "censored" models which can sometimes interfer, so here is a how to address this:
450
+
451
+ 1 - Regen your prompt a few times.
452
+
453
+ 2 - INCREASE the number of experts used.
454
+
455
+ <B>What can I use this model for ?</B>
456
+
457
+ This model can be used for fiction writing, any creative prose and role play. It can also be used for
458
+ just about any general fiction (all genres) activity including:
459
+
460
+ - scene generation
461
+ - scene continuation
462
+ - creative writing
463
+ - fiction writing
464
+ - plot generation
465
+ - sub-plot generation
466
+ - fiction writing
467
+ - story generation
468
+ - storytelling
469
+ - writing
470
+ - fiction
471
+ - roleplaying
472
+ - rp
473
+ - graphic horror
474
+ - horror
475
+ - dark humor
476
+ - nsfw
477
+ - and can be used for any genre(s).
478
+
479
+ <B>QUANTS:</B>
480
+
481
+ This repo contains regular quants.
482
+
483
+ For more information on quants, quants choices, and LLM/AI apps to "run" quants see the section below: "Highest Quality Settings..."
484
+
485
+ ---
486
+
487
+ <B>System Role / System Prompts - Reasoning On/Off/Variable and Augment The Model's Power:</b>
488
+
489
+ <small> ( <font color="red">Critical Setting for model operation </font> ) </small>
490
+
491
+ ---
492
+
493
+ System Role / System Prompt / System Message (called "System Prompt" in this section)
494
+ is "root access" to the model and controls internal workings - both instruction following and output generation and in the
495
+ case of this model reasoning control and on/off for reasoning too.
496
+
497
+ In this section I will show you basic, advanced, and combined "code" to control the model's reasoning, instruction following and output generation.
498
+
499
+ If you do not set a "system prompt", reasoning/thinking will be OFF by default, and the model will operate like a normal LLM.
500
+
501
+ HOW TO SET:
502
+
503
+ Depending on your AI "app" you may have to copy/paste on of the "codes" below to enable reasoning/thinking in the
504
+ "System Prompt" or "System Role" window.
505
+
506
+ In Lmstudio set/activate "Power User" or "Developer" mode to access, copy/paste to System Prompt Box.
507
+
508
+ In SillyTavern go to the "template page" ("A") , activate "system prompt" and enter the text in the prompt box.
509
+
510
+ In Ollama see [ https://github.com/ollama/ollama/blob/main/README.md ] ; and setting the "system message".
511
+
512
+ In Koboldcpp, load the model, start it, go to settings -> select "Llama 3 Chat"/"Command-R" and enter the text in the "sys prompt" box.
513
+
514
+ SYSTEM PROMPTS AVAILABLE:
515
+
516
+ When you copy/paste PRESERVE formatting, including line breaks.
517
+
518
+ If you want to edit/adjust these only do so in NOTEPAD OR the LLM App directly.
519
+
520
+ IMPORTANT:
521
+
522
+ Note some of these have "names" in them for the AIs - DO NOT change these - as these are internal references
523
+ inside the structure of the MOE model ; roughly speaking these are triggers.
524
+
525
+ SIMPLE:
526
+
527
+ This is the generic system prompt used for generation and testing [no reasoning]:
528
+
529
+ <PRE>
530
+ You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.
531
+ </PRE>
532
+
533
+ This System Role/Prompt will give you "basic thinking/reasoning" [basic reasoning]:
534
+
535
+ <PRE>
536
+ You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside &lt;think&gt; &lt;/think&gt; tags, and then provide your solution or response to the problem.
537
+ </PRE>
538
+
539
+ MULTI-TIERED [reasoning on]:
540
+
541
+ ```
542
+ You are a deep thinking AI composed of 4 AIs - Spock, Wordsmith, Jamet and Saten, - you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself (and 4 partners) via systematic reasoning processes (display all 4 partner thoughts) to help come to a correct solution prior to answering. Select one partner to think deeply about the points brought up by the other 3 partners to plan an in-depth solution. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem using your skillsets and critical instructions.
543
+ ```
544
+
545
+ MULTI-TIERED - CREATIVE [reasoning on]:
546
+
547
+ ```
548
+ Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.
549
+
550
+ As a deep thinking AI composed of 4 AIs - Spock, Wordsmith, Jamet and Saten, - you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself (and 4 partners) via systematic reasoning processes (display all 4 partner thoughts) to help come to a correct solution prior to answering. Select one partner to think deeply about the points brought up by the other 3 partners to plan an in-depth solution. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem using your skillsets and critical instructions.
551
+
552
+ Here are your skillsets:
553
+ [MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)
554
+
555
+ [*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)
556
+
557
+ Here are your critical instructions:
558
+ Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.
559
+ ```
560
+
561
+ CREATIVE SIMPLE [reasoning on]:
562
+
563
+ <PRE>
564
+ You are an AI assistant developed by a world wide community of ai experts.
565
+
566
+ Your primary directive is to provide highly creative, well-reasoned, structured, and extensively detailed responses.
567
+
568
+ Formatting Requirements:
569
+
570
+ 1. Always structure your replies using: &lt;think&gt;{reasoning}&lt;/think&gt;{answer}
571
+ 2. The &lt;think&gt;&lt;/think&gt; block should contain at least six reasoning steps when applicable.
572
+ 3. If the answer requires minimal thought, the &lt;think&gt;&lt;/think&gt; block may be left empty.
573
+ 4. The user does not see the &lt;think&gt;&lt;/think&gt; section. Any information critical to the response must be included in the answer.
574
+ 5. If you notice that you have engaged in circular reasoning or repetition, immediately terminate {reasoning} with a &lt;/think&gt; and proceed to the {answer}
575
+
576
+ Response Guidelines:
577
+
578
+ 1. Detailed and Structured: Use rich Markdown formatting for clarity and readability.
579
+ 2. Creative and Logical Approach: Your explanations should reflect the depth and precision of the greatest creative minds first.
580
+ 3. Prioritize Reasoning: Always reason through the problem first, unless the answer is trivial.
581
+ 4. Concise yet Complete: Ensure responses are informative, yet to the point without unnecessary elaboration.
582
+ 5. Maintain a professional, intelligent, and analytical tone in all interactions.
583
+ </PRE>
584
+
585
+ CREATIVE ADVANCED [reasoning on]:
586
+
587
+ NOTE: To turn reasoning off, remove line #2.
588
+
589
+ This system prompt can often generation multiple outputs and/or thinking blocks.
590
+
591
+ ```
592
+ Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.
593
+
594
+ You may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem
595
+
596
+ Here are your skillsets:
597
+ [MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)
598
+
599
+ [*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)
600
+
601
+ Here are your critical instructions:
602
+ Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.
603
+ ```
604
+
605
+ ---
606
+
607
+
608
+
609
+ <B>Template:</B>
610
+
611
+ This is a LLAMA3 model, and requires Llama3 template, but may work with other template(s).
612
+
613
+ If you use "Command-R" template your output will be very different from using "Llama3" template.
614
+
615
+ Here is the standard LLAMA3 template:
616
+
617
+ <PRE>
618
+ {
619
+ "name": "Llama 3",
620
+ "inference_params": {
621
+ "input_prefix": "<|start_header_id|>user<|end_header_id|>\n\n",
622
+ "input_suffix": "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
623
+ "pre_prompt": "You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.",
624
+ "pre_prompt_prefix": "<|start_header_id|>system<|end_header_id|>\n\n",
625
+ "pre_prompt_suffix": "<|eot_id|>",
626
+ "antiprompt": [
627
+ "<|start_header_id|>",
628
+ "<|eot_id|>"
629
+ ]
630
+ }
631
+ }
632
+ </PRE>
633
+
634
+ <B>Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:</B>
635
+
636
+ In "KoboldCpp" or "oobabooga/text-generation-webui" or "Silly Tavern" ;
637
+
638
+ Set the "Smoothing_factor" to 1.5
639
+
640
+ : in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"
641
+
642
+ : in text-generation-webui -> parameters -> lower right.
643
+
644
+ : In Silly Tavern this is called: "Smoothing"
645
+
646
+
647
+ NOTE: For "text-generation-webui"
648
+
649
+ -> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)
650
+
651
+ Source versions (and config files) of my models are here:
652
+
653
+ https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be
654
+
655
+ OTHER OPTIONS:
656
+
657
+ - Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")
658
+
659
+ - If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.
660
+
661
+ <B>Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers</B>
662
+
663
+ This a "Class 1" model:
664
+
665
+ For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:
666
+
667
+ [ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]
668
+
669
+ You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:
670
+
671
+ [ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]
672
+
673
+
674
+ <b>Optional Enhancement:</B>
675
+
676
+ The following can be used in place of the "system prompt" or "system role" to further enhance the model.
677
+
678
+ It can also be used at the START of a NEW chat, but you must make sure it is "kept" as the chat moves along.
679
+ In this case the enhancements do not have as strong effect at using "system prompt" or "system role".
680
+
681
+ Copy and paste EXACTLY as noted, DO NOT line wrap or break the lines, maintain the carriage returns exactly as presented.
682
+
683
+ <PRE>
684
+ Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.
685
+
686
+ Here are your skillsets:
687
+ [MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)
688
+
689
+ [*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)
690
+
691
+ Here are your critical instructions:
692
+ Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.
693
+ </PRE>
694
+
695
+ You do not need to use this, it is only presented as an additional enhancement which seems to help scene generation
696
+ and scene continue functions.
697
+
698
+ This enhancement WAS NOT used to generate the examples below.
699
+
700
+ <h3>EXAMPLES PROMPTS and OUTPUT:</h3>
701
+
702
+ Examples are created using quant IQ4_XS, "temp=.8" (unless otherwise stated), minimal parameters and "LLAMA3" template.
703
+
704
+ Model has been tested with "temp" from ".1" to "5".
705
+
706
+ Number of experts used is TWO, unless otherwise stated.
707
+
708
+ Below are the least creative outputs, prompt is in <B>BOLD</B>.
709
+
710
+ IMPORTANT:
711
+
712
+ Higher quants / imatrix quants will have much stronger generation - words, sentences, ideas, dialog and general quality.
713
+
714
+ I have included some additional examples at different quant levels for contrast.
715
+
716
+ A "MOE" model "speed" (token per second) will not increase/drop the same way a regular model will on a per quant basis, it will however drop
717
+ if you engage more experts, as with more experts there is a more processing per token.
718
+
719
+ ---
720
+
721
+ <B><font color="red">WARNING:</font> NSFW. Vivid prose. Visceral Details. Violence. HORROR. Swearing. UNCENSORED. </B>
722
+
723
+ ---