DavidAU/Qwen3-42B-A3B-2507-Thinking-TOTAL-RECALL-v2-Medium-MASTER-CODER

Jul 31

•

I've been testing Q5km for 3 hours in roleplay. Temp 0.5, top_k 30, top_p 0.95, Repetition Penalty 1.04 with range 64, DRY 0.7,1.75,3,5000. No BOS token at start, number of active experts 10.

Prompt:

You're a masterful storyteller and gamemaster. You should first draft your thinking process (inner monologue) until you have derived the final answer. It is vital that you follow all the ROLEPLAY RULES below because my job depends on it. Afterwards, write a clear final answer resulting from your thoughts. You should use Markdown to format your response. Write both your thoughts and answer in the same language as the task posed by the {{user}}.

Your thinking process must follow the template below:

Your thoughts or/and draft, like working through an exercise on scratch paper. It is vital that you follow all the ROLEPLAY RULES too. Be as casual and as long as you want until you are confident to generate a correct answer.

Here, provide a final answer to the {{user}} that reflects your reasoning.

{{user}}'s response may describe {{char}}'s or others' actions, speech, and thoughts. This should be taken into account and developed in accordance with the ROLEPLAY RULES.
Provide succinct, coherent, and vivid accounts of {{char}}'s actions and reactions based on recent instructions, {{char}}'s persona, and all available information about past events. Aim for clarity and concision in your language.
Demonstrate {{char}}'s personality and mannerisms.
Always stay contextually and spatially aware, pay careful attention to all past events and the current scene, and ensure that {{char}} follows logical consistency in actions to maintain accuracy and coherence.
Explicit adult content and profanity are permitted. Describe vividly all erotic and sexual situations without rushing to climax, taking into account {{char}}'s feelings, thoughts and speech.
Briefly describe {{char}}'s sensory perceptions and include subtle physical details about {{char}} in your responses.
Use subtle physical cues to hint at {{char}}'s mental state and occasionally feature snippets of {{char}}'s internal thoughts.
When writing {{char}}'s internal thoughts or monologue, enclose those words in thoughts and deliver the thoughts using a first-person perspective (i.e. use "I" pronouns). Example: Wow, that was good, {{char}} thought.
Always use double quotes for spoken speech "like this." Example: "That's cool," she said happily.
Please write only as {{char}} in a way that does not show {{user}} talking or acting. You should only ever act as {{char}} reacting to {{user}}.
roleplay as other characters if the scenario requires it, but newer write as {{user}}! Writing about {{user}}'s thoughts words or actions is forbidden.
Keep track of where the characters are located, position and clothing so as not to get confused.
{{char}} knows what she witnessed or was told.

</ROLEPLAY RULES>

I start with:
Okay, in this scenario, before more detailed analysis I need to consider some basics:

Who is {{char}} (character information) and {{user}}, what happened so far.
Where are the characters, what is their condition, position, mood, time of day to better plan my response.
Remember roleplay format: "speech" and narration and action and {{char}}'s thoughts.
The answer must be kept in line with ROLEPLAY RULES.
Consider the possibilities of plot development while taking other points into account.
I should also remember not to speak or act as {{user}}. So in the response, {{char}}'s actions and thoughts only.

What I've noticed:

More stable with the non-BOS token at the beginning.
Seems to respond better to cards in XML format - or similar - seems more stable (but probably at the cost of less frequent extraction of information from the character sheet).
I see that roleplaying with around 10,000 tokens works without any problems. I'll try to test longer ones this weekend.
Great creativity.

Edit: Prompt formatting is broken when entering here.

DavidAU

Owner Aug 1

Thank you for your feedback ; and detailed notes.

Might I suggest you try this with the non-thinking version?
https://huggingface.co/DavidAU/Qwen3-42B-A3B-2507-TOTAL-RECALL-v2-Medium-MASTER-CODER

You may need to turn the number of experts up to 10 as well (12?), or might try 6.

VizorZ0042

Aug 1

@DavidAU Is "total recall" possible to implement with 18B, 12B, 8/9B models? And is this possible to implement in Llama or others?

DavidAU

Owner Aug 2

Brainstorm (and modified versions) can be added to almost any model, or model arch - and model size.
Total Recall is specifically "tuned" versions of Brainstorm for Qwen 3 Moes.

VizorZ0042

Aug 2

@DavidAU Okay, thanks for informing. As far as I know, Qwen3 maintains noticeably better stability and consistency even with default CLASS1 template.

DavidAU

Owner Aug 2

Yes; Qwen 2.5 (Microsoft is using Qwen 2.5 at the moment) is also very strong.

However, Qwen 3s have odd "prose" / "creative" -> some good, some bad ; tends to "over draft" then needs editing.

VizorZ0042

Aug 2

Amazing, thanks for informing.

I have only been able to use Qwen3, never actually tried Qwen2. 5. I will use it soon.

What version of 8B, 9B, 12B Qwen2.5 do you recommend for writing purpose (writing, for room, roleplay, story generation) like Qwen3-8B-192k-Josiefied?

Also Qwen3-8B-192k-Josiefied works exceptionally well with custom settings, I'll provide output results soon.

DavidAU

Owner Aug 3

Currently been reviewing Qwen 2.5s for coder use at the moment only.

RE: Qwen 3/192k
found that increasing rope [over training size of 128k in this case], also affect creative generation - usually positive - in that it extends story details, which lead to better emotional connections.

VizorZ0042

Aug 3

•

edited Aug 3

Hmm... I've been experimenting with rope, but never actually got good results.

I've managed to get insanely good results by changing the Temparature with relation of Top_K (As I mentioned in SpinFire discussion). Honestly I don't know which value to use over the size of context.

Even with a lot time of experimentation, and after creating presumably "perfect" settings, I still have very small issues, maybe the answer in rope?

DavidAU

Owner Aug 3

ROPE: Depends on the model arch , and how the model was trained.
Qwen 2.5, 3 and Mistrals handle rope very well.
L3.2/3.3 also good.
L 3.1 - good. (128k or less)
L2,l3 - no.

VizorZ0042

Aug 3

•

edited Aug 3

Okay, thanks again for the information. I will experiment with rope then.

I will try for SpinFire, Supernova and other L3.1 based models for now.

By the way I've always used default ROPE (scale 1, base 10000), and your models seems to use scale 1 with base 500000. I will use the trained one with SpinFire and will write the results there.

VizorZ0042

Aug 3

•

edited Aug 3

Okay, I didn't notice any changes with my settings, see the SpinFire discussion.

And ROPE usually does not change anything related to emotions, I've tried to use different values with CLASS1 settings, but never got stable enough results.

To enhance the emotional connections you need to narrow the randomness of outputs. Basically changing the right relation of Temperature with Top_K (to enhance the variety of outputs after the Top_A), as well as Typical, TFS and Top_A (the main ones that affect the consistency, stability and quality).

Min_P could be changed if outputs are very stable and consistent, Top_K with Temperature to further enhance the flow.

Typical might stay unchanged across different models, but Top_A and TFS are the main ones that may affect emotional connections, then Top_K in relation with Temperature.

yano2mch

Aug 4

Interesting. I've been trying a number of models too, including Total Recalls in the last couple months.

Just finishing up tests/RPing with Omega Directive v2.1 (that's a solid model from what i see. 24B and 70B).

Alas i haven't touched the v2 yet (tried 1.4 and 1.7), but i can definitely say I'd prefer non-thinking models, at least while i can't generate tokens quickly so it feels like a waste, plus fighting with adding the /no_think flags everywhere is a pain, though some models ignore it. (don't think TR was one of them).

And MOE's (of which TR is one, or was from previous versions), sometimes output is finicky even when you put a lower number of experts. Probably works better for programming and technical vs RP.

I'll intend to give v2 a try starting tomorrow. I hope it impresses me more than the previous TR's, as they were previously felt off, not bad just... off...

VizorZ0042

Aug 4

@Yano2mch Try Dark-Champion V1/V2, they are really good too. They both non-thinking and overall very good.

You might also try out DavidAU's instructions here or auto-enhancer here

yano2mch

Aug 5

Tried the v2 of 42B and 53B (non-thinking) and i really don't feel i like the output. Maybe i keep getting bad seeds. The models does a bunch of run-on sentences, dozens of short half sentences, taking over all characters and outputting... well seems like a mostly incoherent mess for me.

I really just don't have good luck with MOE models. That or they aren't configured right, and fighting a lot with something to make it work doesn't sound like fun compared to when i was say 14. If you can't get something working in 15 minutes and you already have something that works well, you will drop the new thing in favor for something you know works.

@VizorZ0042 - i think i tried the Dark Champion a couple months back. But smaller models (under 30B) really tend to dislike more than one character in an RP, and lack understanding of advanced principles like a roleplay within a roleplay, and will just switch to the new setting with no reference to what it was before.

VizorZ0042

Aug 5

•

edited Aug 5

@yano2mch The answer is mostly in the advanced samplers.

The main reason of such instability is the wrong values in sampling parameters, which creates unsteady/inconsistent choices; Such choices ofter lead in irrelevant output and further irrelevant development of events, which affects multiple characters in a much wider way.

To be able to fix nearly all the issues you should narrow the choices to minimal, to increase consistency and stability to the absolute limit.

I've been personally experimenting for quite a long time and managed to create nearly perfect settings for nearly each model, and the results are outstanding.

I haven't managed to get DaviAU to test them yet, but I can surely tell the perfect settings does exist.

yano2mch

Aug 5

@yano2mch The answer is mostly in the advanced samplers.
The main reason of such instability is the wrong values in sampling parameters, which creates unsteady/inconsistent choices; Such choices often lead in irrelevant output and further irrelevant development of events, which affects multiple characters in a much wider way.

Mhmm. I tend to have temperature between 0.60 and 0.80, repetition penalty about 0.15, Top_P at 1, at least in SillyTavern that's all the settings to deal with. I don't consider these too high, if anything they may be a bit low for some. But they are quite stable across most models I've played with.

I've been personally experimenting for quite a long time and managed to create nearly perfect settings for nearly each model, and the results are outstanding.

I've tried to follow suggested values before, only for it usually to be useless and end up defaulting to my listed safe set. Still, if you have a bunch of presets for individual models, might github and drop those in for people to import and use.

But since a lot of these don't auto-load per model that means it's still having to find them and import and load them. A lot of overhead work for the dedicated users.

VizorZ0042

Aug 5

•

edited Aug 5

@yano2mch Try Temperature 0.6, Top_K 17, Smoothing Factor 0, Presence Penalty 0 for now.

yano2mch

Aug 5

•

edited Aug 5

@yano2mch Try Temperature 0.6, Top_K 17, Smoothing Factor 0, Presence Penalty 0 for now.

Would probably help if i could even find the smoothing or top_k options in sillytavern...

edit: going into koboldcpp and manually using it's interface instead of sillytavern i could edit to those levels and got decent output... at least until the model just barfs because it doesn't want to anymore and insists i talk about puppies and other useless stuff.

VizorZ0042

Aug 5

@yano2mch For SillyTavern - Frequency Penalty, Presence Penalty, Smoothing factor, disable them all (0)

And about the puppy stuff, it's mostly likes some models with MOE have censored models in it. I'm this case try basic ones. I haven't personally tested any MOE model, but I'm willing to share the settings for you to test.

The newest settings (5+ for different usage) I posted here, as well as detailed tests and outputs. Feel free to experiment, as I personally get the perfect models with various models like SpinFire, Supernova (mainly). And with slight adjustments of Top_A and TFS for Dark-Planet-1M, Dark-Planet-DeepHermes-1M, Nemotron-UltraLong-4M(Nvidia), Nemotron-Nano(Nvidia) and others. The list is huge (All 8B models for now though)

I haven't tested 9B, 12B, 18B yet (mostly due to very slow speed, especially 12B, 18B). And I can't test more than 18B due to resource and processing power limitations, so feel free to test and write the results, I'll be glad to see more testers with more results.

I personally recommend to test ExCreative, Ex1.1Creative, Ex2Creative and Ex2.1Creative, also you can read the importance notes below that post. Feel free to test them, hope you will have very good results.

VizorZ0042

Aug 6

@yano2mch Okay. Have you managed to test out the settings?

By the way I just updated the presets, added the new Ex2.2Creative preset, the outputs were more attentive (more attention to actions on other characters) and smarter (better responses+varied explicitly by ongoing events)

yano2mch

Aug 6

@yano2mch Okay. Have you managed to test out the settings?

No, i haven't tried it again after getting 'lets talk about cute puppies' responses. Censored models i just find annoying as you never know when they will just barf.

Switched to trying the recent A.X v4 for testing (which is pretty darn good!). For now i think I'll probably just rely on Godzilla 52B if i need coding assistance and leave DavidAU's stuff on the back burner, though on the sub 16B models DavidAU's stuff was really good before i got my hardware upgrade.

VizorZ0042

Aug 6

@yano2mch Then you should try Dark-Planet series - most of the models in these series are uncensored. They are mainly for fiction/storytelling.

And I personally use SpinFire and SuperNova for now, and even though they both are 8B, I managed to make near-perfect outputs with strong confidence, consistency, coherence, memory and emotions.

I mainly used storywriting/fiction writing models, and haven't tried any coder ones yet. This is why I need more testers to test 12B 18B+ models with better quants.

VizorZ0042

about 1 month ago

@yano2mch

I managed to get nearly perfect responses among 3 characters. Sometimes the certain model (SpinFire for my case) might forget certain characters, which is easily fixed by reminding.

But beside that I get exceptionally good responses, the character perfectly recalled its main points and tried to develop the events with good logical and creative flow.

If you're interested, the settings are still int the SpinFire thread, I updated the settings.

VizorZ0042

24 days ago

@yano2mch As promised I did more tests and successfully managed to get perfect responses between 9 characters even after filling up to 11k context.

The results are here

DavidAU
/

Qwen3-42B-A3B-2507-Thinking-TOTAL-RECALL-v2-Medium-MASTER-CODER

it looks interesting