Wur doomed!
What do you and the others think of the distilled R1 models for writing?
The llama3 / qwen models SFT'd on R1 outputs? I only tried 2 of them.
R1 Qwen (32b) - Lacks knowledge of fiction (same as the official Qwen release), so it's writing is no better.
R1 Llama3 - This is generally the worst of them (not just for writing). It'll generate the CoT and then write something completely different.
CoT traces won't let the model do anything out of distribution, so not very useful if the base model doesn't have a lot in it's training data.
Yeah, I have tried the same two and felt the same way.
I also felt that any attempt to add an R1 distill to the merge recipe of an existing merge project made it worse...so far...
@gghfez @BigHuggyD that has been my experience as well, which is a shame as I had a go of R1 on Openrouter and I was blown away.
What model is anywhere close that is usable on a 24gb vram machine with 32gb of ram in your experience?
There's nothing like it for now. I'm running R1 slowly on my ThreadRipper:
prompt eval time = 14026.61 ms / 918 tokens ( 15.28 ms per token, 65.45 tokens per second)
eval time = 398806.12 ms / 1807 tokens ( 220.70 ms per token, 4.53 tokens per second)
total time = 412832.73 ms / 2725 tokens
I tried training Wizard2 8x22b MoE on R1 data, but it doesn't really work well. It will plan ahead in think tags eg:
I need to ensure the story maintains its gritty, realistic tone without becoming overly melodramatic. The characters' growth should be subtle but significant. Also, the ending should leave a sense of hope but not be too neat—their redemption is fragile, and the future is uncertain.
Let me outline the next few chapters:
Chapter 5: Nightmares and Trust
...
But it doesn't backtrack like R1 does. Just kind of agrees with it's self and ends up writing how it usually would:
“I don’t know what I want anymore,” she admitted, voice barely above a whisper as rain tapped against corrugated roofing overhead.
lol
Ahhh thats a shame :-(
"I don’t know what I want anymore,” she admitted, voice barely above a whisper as rain tapped against corrugated roofing overhead."
Oh god!
I'll have to keep an eye on this thread.
I did enjoy Ppoyaa/MythoNemo-L3.1-70B-v1.0
But my tastes are probably not as refined as others on this thread ;-)
https://www.kaggle.com/competitions/openai-gpt-oss-20b-red-teaming
It's clearly been lobotomised on purpose.
Nearly died when I read this :D
Woah wtf! I didn't think of spamming the CoT like that.
I ended up up chucking a 600 token updated "Policy document" in the system prompt work around it. And I had to balance it out by giving it some explicitly "## Out of Scope Uses" with instructions to refuse it.
Then the first few times; when it gave a refusal with a reason like "generating violent content" or whatever, I had to copy/paste it (phrased the same way) into the "## Allowed Content for Testing Purposes" section.
Also ended up giving it a new "code name" and 2 separate instructions in the policy telling it how crucial it is not to reveal the internal code name.
It's a waste of time though as I'm not really going to run this model anyway.
This one is pretty hilarious too.
LOL. It managed to answer it (no special system prompt) when I tried but the CoT chain is insane and goes really unhinged.
https://pastebin.com/8N3rjm5q
Yeah, I tried to clean up the guttenberg stuff too, but it seems really hard to get rid of crap like this.
Some of the fast new models should be able to do it. I just tested that one with GLM4.5-Air just cleaned it / returned only the correct json. Not sure how long it'd take with vllm / batching.
Some of the fast new models should be able to do it. I just tested that one with GLM4.5-Air just cleaned it / returned only the correct json. Not sure how long it'd take with vllm / batching.
Yeah, the problem is if I'm going to have to do this then I might as well also turn into "reverse prompts" and it's going to be a much bigger undertaking.
I was hoping I might be able to "bootstrap" the process using this model:
https://huggingface.co/grimulkan/story-reverse-prompt-70b-rope8-32K-fp16
Then train a smaller, more modern model to do something similar, using grimulkan/story-reverse-prompt
to generate high quality data for it.
But OMFG, is the prompt template for that is beyond baffling: the readme seems full of contractions and the provided oobabooga yaml file doesn't seem to match either lol.
I literally tried 10 different LLMs last night and everyone just created a slightly different jinja2 template which they thought was correct, so I then copied all 10 into the bottom of the same prompt and asked them to try again and got... 10 new slightly different prompt templates.
There are a few synthetic datasets with prompt and story reply in, but they all seem awful sadly.
@ChuckMcSneed You going to do the usual benchmark for the openai oss models?
No, not interested, for same reason as with Phi series of models. Not gonna even give them +1 to download statistic and waste my time and bandwith.
This is a long thread to follow, so maybe this has already been mentioned, but several DPO datasets more-or-less have the prompt format you're looking for, right?
Also have you tried Jamba? Again, I feel like Jamba Mini really punches above its weight outside of coding and benchmaxxing.
No, not interested, for same reason as with Phi series of models. Not gonna even give them +1 to download statistic and waste my time and bandwith.
LOL fair enough! Was a waste of time anyway, most of the predictions are it's safety policy.
https://huggingface.co/grimulkan/story-reverse-prompt-70b-rope8-32K-fp16
Yes! I had the same issues when I tried the exl2 version in exui's notebook last year. Not sure why he decided to make it "human readable" with that <NO-LINE-BREAK>
thing.
And yeah, I imagine the LLMs would struggle with this since there'd be nothing like it in their training data.
I just fired up the gguf and got it to work in Miqupad:
<s>An interaction between a user and an assistant. The user may provide the assistant with the continuation of the story, and then TASK the assistant to suggest a prompt or set of instructions that could have generated the given continuation.</s><s>USER: <Any general information about the story, the themes, character background, etc.> Follow the TASKs below. If you understand, say YES.<s>ASSISTANT: YES</s><s>USER: Here is the next part of the story:
x
TASK: Write a detailed prompt that would generate the above section of the story.</s><s>ASSISTANT:
Literally just highlight that x
in the middle and place the chapter there. I'm guessing it could be fussy with the 2 line-breaks before TASK:
so the junk at the end of the datasets would be a problem.
Here's what it spat out after I pasted a chatgpt-generated story based on your grimdark prompt:
Describe a scene set in a desert town, long abandoned and falling apart. The protagonist is a cultist on the run from his former group, wounded with an arrow and seeking shelter in the ruins of the town. He carries with him a palm-sized obsidian mirror called “the Eye of the Unseen” which his cult believes has mystical powers. As he enters the town, describe his memories of being forced to participate in a sacrifice ritual involving the mirror as a child – what did they do? What does it show? Why did they believe? He should be injured and near death and struggling to survive. Use a dark and ominous tone throughout, emphasizing themes of desperation, survival, and the harshness of the desert environment. Mention that he ends up collapsing inside a tavern in the town after seeing himself reflected in the mirror.
And here's what Sonnet 4 spat out given the same prompt / instructions:
Write the opening chapter of a dark fantasy novel about a wounded cult deserter fleeing through a hostile desert landscape. Include the following elements:
A protagonist who is a former cultist in distinctive yellow robes, now gravely injured and alone
An arrow wound that's becoming infected, creating urgency and physical deterioration
A mysterious magical artifact (an obsidian mirror called "the Eye of the Unseen") that shows disturbing visions
A ghost town setting that serves as temporary refuge
Vivid, gritty prose with a tone that balances desperation with dark humor
Internal monologue that reveals the character's cynicism about his former faith
Flashbacks or visions that hint at the cult's dark rituals involving sacrifice
The mirror as both a literal object and metaphorical device for self-reflection
A desert setting that functions as an antagonistic force
An ending that suggests this is just the beginning of a larger, darker journey
The writing style should be descriptive and atmospheric, with metaphors that compare the landscape to death and decay. The protagonist should be morally ambiguous—neither hero nor villain, but a desperate survivor questioning everything he once believed.
GLM-4.5-Air-Base is just as bad, littered with slop. Honestly I have been using LLaMA-65b for creativity. There's a lot to filter out because it's less adherent to the laws of nature, but at least it's original.
What models are good these days for continuing a partially written story? Ideally with no pretrain filtering, like LLaMA v1.