Text Generation
GGUF
English
mixture of experts
Mixture of Experts
8x3B
Llama 3.2 MOE
128k context
creative
creative writing
fiction writing
plot generation
sub-plot generation
story generation
scene continue
storytelling
fiction story
science fiction
romance
all genres
story
writing
vivid prosing
vivid writing
fiction
roleplaying
bfloat16
swearing
rp
horror
mergekit
conversational
Update README.md
Browse files
README.md
CHANGED
@@ -76,18 +76,85 @@ Example outputs below.
|
|
76 |
- If you use rope to extend context, increase temp AND instructions detail levels to compensate for "rope issues".
|
77 |
- Source code for this model and Imatrix GGUFs versions will be uploaded shortly at separate repos.
|
78 |
|
79 |
-
<B>Mixture of Experts
|
80 |
|
81 |
-
This model is comprised of the following 8 models (in full):
|
82 |
|
83 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
84 |
|
85 |
The mixture of experts is set at 2 experts, but you can use 3,4,5,6.. 7 and even 8.
|
86 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
87 |
That means the power of every model is available during instruction and output generation.
|
88 |
|
89 |
This brings unparalleled power to all forms of generation and all use cases.
|
90 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
91 |
<B>What can I use this model for ?</B>
|
92 |
|
93 |
This model can be used for fiction writing, any creative prose and role play. It can also be used for
|
|
|
76 |
- If you use rope to extend context, increase temp AND instructions detail levels to compensate for "rope issues".
|
77 |
- Source code for this model and Imatrix GGUFs versions will be uploaded shortly at separate repos.
|
78 |
|
79 |
+
<B>Meet the Team: Mixture of Experts Models</b>
|
80 |
|
81 |
+
This model is comprised of the following 8 models ("the experts") (in full):
|
82 |
|
83 |
+
https://huggingface.co/huihui-ai/Llama-3.2-3B-Instruct-abliterated
|
84 |
+
|
85 |
+
- https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
|
86 |
+
- https://huggingface.co/Hastagaras/L3.2-JametMini-3B-MK.I
|
87 |
+
- https://huggingface.co/ValiantLabs/Llama3.2-3B-Enigma
|
88 |
+
- https://huggingface.co/Hastagaras/L3.2-JametMini-3B-MK.III
|
89 |
+
- https://huggingface.co/huihui-ai/Llama-3.2-3B-Instruct-abliterated
|
90 |
+
- https://huggingface.co/chuanli11/Llama-3.2-3B-Instruct-uncensored
|
91 |
+
- https://huggingface.co/Lyte/Llama-3.2-3B-Overthinker
|
92 |
+
- https://huggingface.co/prithivMLmods/Llama-3.2-3B-Promptist-Mini
|
93 |
|
94 |
The mixture of experts is set at 2 experts, but you can use 3,4,5,6.. 7 and even 8.
|
95 |
|
96 |
+
You can set the number of experts in LMStudio (https://lmstudio.ai) at the "load" screen and via other apps/llm apps by setting "Experts" or "Number of Experts".
|
97 |
+
|
98 |
+
When using "API", you set the "num_experts_used" in the JSON payload (this maybe different for different back ends).
|
99 |
+
|
100 |
+
This "team" has a Captain (first listed model), and then all the team members contribute to the to "token"
|
101 |
+
choice billions of times per second. Note the Captain also contributes too.
|
102 |
+
|
103 |
+
Think of 2, 3 or 4 master chefs in the kitchen all competing to make the best dish for you.
|
104 |
+
|
105 |
+
This results in higher quality generation.
|
106 |
+
|
107 |
That means the power of every model is available during instruction and output generation.
|
108 |
|
109 |
This brings unparalleled power to all forms of generation and all use cases.
|
110 |
|
111 |
+
CREDITS:
|
112 |
+
|
113 |
+
Please visit each repo above to see what model(s) contributed to each of models above.
|
114 |
+
|
115 |
+
Special credit goes to MERGEKIT, without you this project / model would not have been possible.
|
116 |
+
|
117 |
+
[ https://github.com/arcee-ai/mergekit ]
|
118 |
+
|
119 |
+
<B>Special Operations Notes for this MOE model:</B>
|
120 |
+
|
121 |
+
Because of how this "MOE" model is configured, even though the default is 2 experts, the "selected" 2 will vary during generation.
|
122 |
+
|
123 |
+
(same applies if you change the number of experts used)
|
124 |
+
|
125 |
+
This results in vastly different output generation PER generation of each prompt.
|
126 |
+
|
127 |
+
This is a positive in terms of variety, but also means it may take 2-4 regens (of the same prompt) to get the highest quality.
|
128 |
+
|
129 |
+
In addition, this model responds very well to Dry, Dynamic Temp, and Smooth/Quadratic samplers.
|
130 |
+
|
131 |
+
Using these in conjunction with the model can vastly improve output quality.
|
132 |
+
|
133 |
+
Higher temps (above 1) can also aid in generation - especially word choice/sentence generation.
|
134 |
+
|
135 |
+
When you increase the number of experts used output quality will also increase, at the cost of tokens per second speed.
|
136 |
+
|
137 |
+
As you increase/decrease the number of experts, you may want to adjust temp, samplers, and advanced samplers too.
|
138 |
+
|
139 |
+
Your quant choice(s) too will impact instruction following and output generation roughly this means the model will understand
|
140 |
+
more nuanced instructions and output stronger generation the higher you go up in quant(s).
|
141 |
+
|
142 |
+
Quants, Samplers, Generational steering and other topics are covered in the section below:
|
143 |
+
|
144 |
+
"Highest Quality Settings..."
|
145 |
+
|
146 |
+
<B>Censored / Uncensored / Abliterated:</B>
|
147 |
+
|
148 |
+
This model contains several uncensored and/or Abliterated models.
|
149 |
+
|
150 |
+
As a result is can output uncensored material.
|
151 |
+
|
152 |
+
However there are a few "censored" models which can sometimes interfer, so here is a how to address this:
|
153 |
+
|
154 |
+
1 - Regen your prompt a few times.
|
155 |
+
|
156 |
+
2 - INCREASE the number of experts used.
|
157 |
+
|
158 |
<B>What can I use this model for ?</B>
|
159 |
|
160 |
This model can be used for fiction writing, any creative prose and role play. It can also be used for
|