Great job, really nice!

#1
by SubtleOne - opened

So first of all, big congrats. The creative and text output is quite different from the typical Llamas I get, and excellent. Love the rich variety in vocabulary and descriptions. I can only fit the IQ4_XS model on my laptop, but 'tis plenty fine. What is MOE, and what are 'power levels'?

I actually have a standard creative query for the models, which I use to gauge their linguistic skills and general creativity. This model passed with flying colors.

Excellent. thank you!

A MOE (Mixture of Experts) is roughly speaking a collection of models all (or not) working together. The config I used is set to use 4 of the Dark Planet models in this model. Power levels refers to lowering or raising the number of models contributing to generation. For this model specifically, this means bringing more 8B Dark Planet models online (or off) to a maximum of 8... equal to 64B parameters.

If I use fewer than the full 8 models, will this reduce the memory requirements in practice?

This will increase tokens/second speed ; as there is a literally less processing per token happening.

Roughly if you are getting 40 tokens per second @ 4 experts, 8 experts will be around 10-15 t/s.

However, the entire model is still loaded in VRAM regardless of experts used/activated.

thank you, David. this is amazing, the best model I've used so far (under 50B). it's fast, smart, logical and immersive. it has good prose that is vivid and evocative, good variation and no formulaic responses... the contextual and environmental awareness is also top-notch (which many models lack). the only shortcoming is the 8k context window size (cuz I'm a junkie of long RP). the RoPE setting in KoboldCpp doesn't seem to work at all: even if I increase the context size just a tiny bit, the output becomes complete gibberish

so I'm wondering whether you've plans to make a model of comparable quality with a bigger context window? also, do you have Patreon or similar? I'd like to buy you a coffee :)

@HuggyTavern

thank you so much ;

RE: 8k ; Currently I have expanded Dark Planet 8B to 128k and 1 million context.
Here:
https://huggingface.co/DavidAU/Llama-3.1-128k-Dark-Planet-Uncensored-8B-GGUF

https://huggingface.co/DavidAU/Llama-3.1-1-million-cxt-Dark-Planet-8B-GGUF

(The one million context is based on Nvidia's 1 million context instruct)

That being said, all the models in "Mirrored" would have to be converted, and a new "Mirrored" made.
I am still working through the conversion process/issues / tweaking.
The issues multiple in a "MOE" , so this will take some time to work thru.

Also, some of the next "MOEs" are "gated" (Mirrored is called "random").

The newest gated are here:
https://huggingface.co/collections/DavidAU/d-au-moe-gated-iq-multi-tier-models-67fc6d0884224b6e5a3d6ba0

I have a prototype "Dark Planet" gated (4X8 = 32B), currently under testing/config'ing.
I do not have a Patreon at this time ; thank you for the kind offer.

SIDE NOTE:
I terms of story, you may want to try this one:

https://huggingface.co/DavidAU/Llama-3.1-1-million-ctx-DeepHermes-Deep-Reasoning-8B-GGUF

Don't let the size fool you ; some of the "training" (at 1 million context) transferred over to the new model.

And Dark Planet 8B Reasoning Version:
https://huggingface.co/DavidAU/L3.1-Dark-Reasoning-Dark-Planet-Hermes-R1-Uncensored-Horror-Imatrix-MAX-8B-GGUF

More powerful versions are around the corner ...

Sign up or log in to comment