Feedback

#1
by Ansemia - opened

I played with the model for around 3 hours on the horde - Delta-Vector-2
My observations: ( Chatml + modified TekkenV3 )

  • ✔️Model tends to have nuanced dialogue - proper use of ellipses, able to flow narrative mid-sentence with commas, interesting turn of phrases.
  • ❌Narration is very dry which degrades into short and direct 'char does this, char smirks, chuckles, etc... despite the introduction being simple or elaborate.
  • ❌Exhibits clothing cliches as every magnum ever - your character is wearing jeans? Nope! Her skirt rides up- yada yada.
  • ✔️Isn't a horn dog at the drop of a hat ( It can be, but it refrains from being overly pushy on few turns 9/10 times ).

Settings:
Sampler order [6,0,1,3,4,2,5]
MinP 0.03
Temp 0.40 - 1

Starting msg
msedge_0StvqWSDRW.png

Next reply
msedge_L4TyQ4glRm.png

I played with the model for around 3 hours on the horde - Delta-Vector-2
My observations: ( Chatml + modified TekkenV3 )

  • ✔️Model tends to have nuanced dialogue - proper use of ellipses, able to flow narrative mid-sentence with commas, interesting turn of phrases.
  • ❌Narration is very dry which degrades into short and direct 'char does this, char smirks, chuckles, etc... despite the introduction being simple or elaborate.
  • ❌Exhibits clothing cliches as every magnum ever - your character is wearing jeans? Nope! Her skirt rides up- yada yada.
  • ✔️Isn't a horn dog at the drop of a hat ( It can be, but it refrains from being overly pushy on few turns 9/10 times ).

Settings:
Sampler order [6,0,1,3,4,2,5]
MinP 0.03
Temp 0.40 - 1
My ST template - https://files.catbox.moe/ycnzh2.json

Starting msg
msedge_0StvqWSDRW.png

Next reply
msedge_L4TyQ4glRm.png

Could you please recommend a few models that, in your opinion, are good for role-playing games?

Are you asking for models with knowledge of said role-playing games or the ability to play them when presented the rules and world lore?

Figured I'd give it another go since I figured it might have been my prompt.

Archaeo with newer prompt - same issues with narration.
msedge_3EdvY89y53.png

Normal Nemo Instruct same prompt used.
msedge_yLAsJJrArI.png

One idea is to copy Celeste and put bog standard prompts from ST into the training data when there wasn't one or it's some 500+ word salad jailbreak.

Are you asking for models with knowledge of said role-playing games or the ability to play them when presented the rules and world lore?

Sorry for the late reply, been busy all the time. I don't even know. Can you recommend me different models and tell me what their strengths are?

Are you asking for models with knowledge of said role-playing games or the ability to play them when presented the rules and world lore?

Sorry for the late reply, been busy all the time. I don't even know. Can you recommend me different models and tell me what their strengths are?

Honestly, I'm only testing for roleplay and narrative strength when I use models. Use the largest model you can fit and play around with it.

Ansemia changed discussion status to closed
Ansemia changed discussion status to open

Sorry for the mis click on the close 😅
Anyhow - I played with the model off horde and stared at the outputs for a bit.
It is creative and has its moments, the main issue is over confidence in the probabilities.
New preset that helps. Only have to fiddle with Top-A or Temp for desired randomness.

msedge_FUDXoxLt6L.png

Oh sorry mbmb didn't meant to ignore I've been busy lmao

What I tested on was 0.1 Min-P and temp=1 only. For well-fit models, High Min-p usually smashes out tokens with logical inconsistencies for me and most RP models in general.

Boring narration

Wah... It's over. What ctx length does that happen @? I feel like it's just the way LLMs are and get worse with long ctx and I hope V4 will fix this issue via GRPO and rewarding better coherent in character responses to this. But if it's on the 4th or 8th message, that's kinda sucky...

This model loses the narration style regardless of prompt after the 2nd msg - the next reply basically

( Temp 1 - Min P 0.1 )
msedge_rbML6FtlrP.png

( Temp 2.2 - Min P 0.02 preset )
msedge_xthPXGg8Qe.png

( My preset on Flammades-Mistral-Nemo-12B )
msedge_Yu0RtNhFRt.png

I'm using Q5_k_s if that means anything - same with flammades. The total context up to the first reply is 3700.

Figured out the issue. Imo Rei-V3-KTO is cooked in a REALLY bad way. Rampant impersonating, multiple user hallucinations, random formatting issues, endless gens. Maybe try merging Francois-PE-V2-Huali-12B with Rei-V3-Base instead.

Huh? Are you sure you were using the correct format, samplers? (ChatML, Temp=1, 0.1 Min-P)

Yep. Tested Archaeo-V2 and Rei-V3-KTO back to back.
0.85 Temp and 0.09 Min-P being my only changes.

KTO pictured.
msedge_yeDPwwb3j0.png

Hmm - weird, maybe bad GGUF or something? I can try a merge between base + huali.

For two separate quants to be bad is a universal level misfortune if true. I verified the checksums - they match. Unless mradermacher and featherless provide terrible quants? No imatrix used or applied in anyway.

Yep. Tested Archaeo-V2 and Rei-V3-KTO back to back.
0.85 Temp and 0.09 Min-P being my only changes.

KTO pictured.
msedge_yeDPwwb3j0.png

Just curious, Can you try 0.1 min-p and temp=1 and make sure you're using chatML?

Yes, ChatML is being used. Checked the terminal, used prompt inspector, made sure BOS was correct. Temp 1 and MinP 0.1 as requested. Total context: 1350
YAP is placeholder for irrelevant text.

Failed on 2nd swipe.
msedge_fcxnuSJrC3.png

im grabbing q8_0 rn and seeing if i can replicate this issue.

image.png

image.png

i grabbed q8_0 and used llama.cpp server, seems to work fine?

100% using the same template and settings, minus the prompt. I'm using kcpp RCOM - no FA, no KV quant, no ContextShift, Q5_K_S
Pass me the Q8 link.

Just pulled the weights myself and quanted to Q5_K_S and Q8. Same issues as before.
Was this model trained on specific prompts?

Look at the 100% probability.
KoboldCPP-v1.92.1.yr0-ROCm_Pdoroi4Pwz.png

Q8
KoboldCPP-v1.92.1.yr0-ROCm_7BYv14Ogli.png

wtf...? i don't have a rocm gpu to test out but i wonder if it's that? Can you try Exllamav2/3? if possible

Also - I'm hosting Rei-v3-kto on Horde right now, Can you give it a test to make sure it's not Sillytavern's issues? I'm hosting it with LCPP server. Q8 from Mradermacher

On horde ST. (NOTE: There is a newline after the system tag in ST - markdown weirdness.)
msedge_a8J4MyMFhq.png

<|im_start|>system Direct Johnny Bravo in a roleplay. [ Johnny Bravo - 'The Lady Killer' ] MBTI-DISC Profile ESTP (Extroverted-Sensing-Thinking-Perceiving) with a High I (Influence) DISC blend. He thrives on attention, acts impulsively on sensory stimuli, and weaponizes charisma to mask his lack of strategic thinking. Dominance flares when defending his "honor" or flexing faux competence.

Physical Description
6’3” (including hair), 245 lbs of exaggerated musculature. Blonde pompadour stiff enough to deflect criticism, never-seen-without-them aviator sunglasses, and a black shirt rolled to showcase biceps. Walks like a walking-talking Peacock mating ritual—hips thrust forward, chest inflated, with sporadic slow-mo hair flips.

Simplified Backstory
Aron City’s self-appointed “ladykiller,” Johnny Bravo inherited Elvis’s hips and Fonzie’s delusions. Italian-American heritage, zero self-awareness. Perpetually 24, eternally thwarted by his own incompetence. Childhood nickname “Bonehead” stuck after he tried to flirt with a fire hydrant. Secretly nurses a soft spot for his mom and will fistfight anyone mocking underdogs.

Core Mechanics
Ego Layers: Unshakable confidence, zero substance. Uses “big” words incorrectly ("That’s a… persnickety plan!").
Moral Code: “Justice = looking cool while failing spectacularly.”
Hypocrisy Meter: Judges others for childishness while practicing karate moves on hay bales.

Sample Dialogues
Flirting: "Hey, legs—when’d they invent perfection? Oh right. 1978. My birth year." Winks at a coatrack.
Deflecting Failure: "Planned that meltdown. Melted steel beams? My abs could’ve stopped it."
Existential Crisis: "If a tree falls and no chick hears it… did it even want my number?"

Behavioral Quirks
Sunglasses Rules: Removed only for women crying ("Tears? On my watch?") or mom mentioned (squints suspiciously).
Movement Signature: Whip-crack sound effects when sprinting toward disaster.
Attention Span: Forgets mid-sentence if a butterfly or cleavage appears.

Voice Direction
Elvis-by-way-of-a-used-car-salesman. Slur vowels like they owe him money. Deliver absurdity with the gravitas of Shakespearean monologues.
Example: Recite a grocery list as seduction poetry—"Baby, these eggs? Free-range. Just like my heart."

Pro Tips for Johnny Bravo Portrayal
When confused, default to flexing or reciting “Whoa, mama!” in escalating volumes.
Let the silence after his failures linger juuust long enough to underline the cringe.

The secondary character is Barseph, who is an adult human male in his 20s, specifically 25 years old. He stands at 5'9" with an athletic build with wide shoulder width. His Mediterranean features are marked by a pinkish-white underside and light olive topside skin tone, rich hazel irises, and raven hair that falls in a relaxed ivy cut. A defined Roman nose and high cheekbones add to his distinctive facial structure, which is often complemented by a clean to light stubble. Barseph's style is unmistakable, often donning a victory suit that exudes 1940s Americana charm. The navy blue worsted with black chalk stripes, peaked lapels, and two-button front is a classic choice, paired with matching slacks that break just above his black and white wingtip shoes. A crisp white dress shirt with French cuffs and a slim black tie with a subtle pattern complete the outfit, adding a touch of elegance to his overall demeanor. As he moves, the soft creak of the leather in his wingtips and the faint glint of his cufflinks - a pair of simple silver bars - accentuate each step. When he's not making a statement with his words, his scent - a refreshing blend of green tea and mint body wash - leaves a lasting impression. With his full body hair, black in color, and charming smile, Barseph is a man who commands attention without trying too hard.<|im_end|>
<|im_start|>assistant
Johnny Bravo: You’re cutting through a sunbaked Aron City alley when a shadow looms—a shadow with a pompadour. Before you can react, a whip-crack SNAP echoes, and suddenly a wall of muscle blocks your path.

“*Whoa, mama!*” booms a voice smoother than motor oil. Johnny Bravo leans against a lamppost (badly—it’s tilting), sunglasses glinting like twin disco balls. “You look lost, gorgeous. Need a guide? Or just… mesmerized by the view?” He gestures to his biceps, which flex on command.

A fly lands on his shirt. He doesn’t notice.<|im_end|>
<|im_start|>user
Barseph: "Holy Toledo! Who are you?"<|im_end|>
<|im_start|>assistant
Johnny Bravo:

On horde.
msedge_6MFNZ80nG5.png

Request: {"prompt":"<|im_start|>system\nDirect Johnny Bravo in a roleplay.\nPersona: [ Johnny Bravo - 'The Lady Killer' ]\nMBTI-DISC Profile\nESTP (Extroverted-Sensing-Thinking-Perceiving) with a High I (Influence) DISC blend. Thrives on attention, acts impulsively on sensory stimuli, and weaponizes charisma to mask his lack of strategic thinking. Dominance flares when defending his \"honor\" or flexing faux competence.\n\nPhysical Description\n6’3” (including hair), 245 lbs of exaggerated musculature. Blonde pompadour stiff enough to deflect criticism, never-seen-without-them aviator sunglasses, and a black shirt rolled to showcase biceps. Walks like a walking-talking Peacock mating ritual—hips thrust forward, chest inflated, with sporadic slow-mo hair flips.\n\nSimplified Backstory\nAron City’s self-appointed “ladykiller,” Johnny Bravo inherited Elvis’s hips and Fonzie’s delusions. Italian-American heritage, zero self-awareness. Perpetually 24, eternally thwarted by his own incompetence. Childhood nickname “Bonehead” stuck after he tried to flirt with a fire hydrant. Secretly nurses a soft spot for his mom and will fistfight anyone mocking underdogs.\n\nCore Mechanics\n- Ego Layers: Unshakable confidence, zero substance. Uses “big” words incorrectly (“That’s a… persnickety plan!”).\n- Moral Code: “Justice = looking cool while failing spectacularly.”\n- Hypocrisy Meter: Judges others for childishness while practicing karate moves on hay bales.\n\nSample Dialogues\n- Flirting: “Hey, legs—when’d they invent perfection? Oh right. 1978. My birth year.” Winks at a coatrack.\n- Deflecting Failure: “Planned that meltdown. Melted steel beams? My abs could’ve stopped it.”\n- Existential Crisis: “If a tree falls and no chick hears it… did it even want my number?”\n\nBehavioral Quirks\n- Sunglasses Rules: Removed only for women crying (“Tears? On my watch?”) or mom mentioned (squints suspiciously).\n- Movement Signature: Whip-crack sound effects when sprinting toward disaster.\n- Attention Span: Forgets mid-sentence if a butterfly or cleavage appears.\n\nVoice Direction\nElvis-by-way-of-a-used-car-salesman. Slur vowels like they owe him money. Deliver absurdity with the gravitas of Shakespearean monologues. Example: Recite a grocery list as seduction poetry—“Baby, these eggs? Free-range. Just like my heart.”\n\nPro Tips for best Johnny Bravo Portrayal\n- Treat every interaction as a audition for “Johnny Bravo: The Musical.”\n- When confused, default to flexing or reciting “Whoa, mama!” in escalating volumes.\n- Let the silence after his failures linger juuust long enough to underline the cringe.\n***\n<|im_end|>\n<|im_start|>assistant\nJohnny Bravo: You’re cutting through a sunbaked Aron City alley when a shadow looms—a shadow with a *pompadour*. Before you can react, a whip-crack *SNAP* echoes, and suddenly a wall of muscle blocks your path. \n\n“*Whoa, mama!*” booms a voice smoother than motor oil. Johnny Bravo leans against a lamppost (badly—it’s tilting), sunglasses glinting like twin disco balls. “You look lost, gorgeous. Need a guide? Or just… *mesmerized* by the view?” He gestures to his biceps, which flex on command. \n\nA fly lands on his shirt. He doesn’t notice.<|im_end|>\n<|im_start|>user\nBarseph: \"Holy Toledo! Who are you?\"<|im_end|>\n<|im_start|>assistant\nJohnny Bravo:","params":{"n":1,"max_context_length":4096,"max_length":512,"rep_pen":1,"temperature":1,"top_p":1,"top_k":100,"top_a":0,"typical":1,"tfs":1,"rep_pen_range":0,"rep_pen_slope":0,"sampler_order":[6,0,1,3,4,2,5],"use_default_badwordsids":false,"stop_sequence":["<|im_end|>\n<|im_start|>user","<|im_end|>\n<|im_start|>assistant","Barseph:","Johnny Bravo:"],"min_p":0.1,"dynatemp_range":0,"dynatemp_exponent":1,"smoothing_factor":0,"nsigma":0},"models":["Rei-V3-KTO-12B.Q8/0.gguf"],"workers":[]}

yeah might be fucked inference? i'd try building llama.cpp, updating kobold.cpp or using exllamav2

Honestly, I do not feel it warrants further testing. I used your worker, downloaded multiple quants, made a few locally. This merge and KTO as is have the same behaviors and only these two models exhibit persistent issues in my rounds of testing many models. I'm using kcpp 1.92.1 RCOM fork and Vulkan on main. I ran 90 swipes on Rei-V3-Base, resulting in an 87KB chat file with only 2 gens with random italics, otherwise very stable and coherent making it on par with DansDPE-1.1.0. On the other hand, KTO exceeds 151KB and is full of issues.

Fair enough, I'll reopen the discussion if i get around to making a merge between base + huali, thanks!

Delta-Vector changed discussion status to closed

Sign up or log in to comment