Nitral-AI/Captain-Eris_Violet-GRPO-v0.420 · Reasoning maybe slightly undercooked?

9 days ago

Not to complain, because the model is awesome and helpful. But reasons considerably less reliably and takes instructions for reasoning significantly worse than DeepHermes 8b. It's reasoning sections also seem quite short, by comparison, not very deep. If you ask it to think about something, or even edit the reasoning section, it just sort of winds it up and moves on. Great model, and nice to have some reasoning in this size for sure (AFAIK it's the only one outside of DavidAU's loras, which I struggled to merge myself)

Just in case the feedback is useful!

Nitral-AI

Owner 8 days ago

•

edited 8 days ago

Not to complain, because the model is awesome and helpful. But reasons considerably less reliably and takes instructions for reasoning significantly worse than DeepHermes 8b. It's reasoning sections also seem quite short, by comparison, not very deep. If you ask it to think about something, or even edit the reasoning section, it just sort of winds it up and moves on. Great model, and nice to have some reasoning in this size for sure (AFAIK it's the only one outside of DavidAU's loras, which I struggled to merge myself)

Just in case the feedback is useful!

The reasoning length was partially a training style choice—I don’t enjoy five paragraphs of reasoning followed by a sudden reveal like, "It was Carl all along." The other factor was limited compute access during the GRPO phase, which forced me to reduce the maximum completion length significantly.

If you want consistent reasoning, modify the first bot message to specify the reasoning style you prefer. Additionally, I recommend reviewing the character card provided in the repository, as it helps set up the response flow. Examples are particularly useful for guiding the model, and prefixing can also improve consistency.

For instance, try forcing messages to start with <reasoning> . I’ve found this makes reasoning more consistent on my end.

Regarding reasoning instructions, I only provided three varied prompts and otherwise trained without system messages, so it makes sense that the model isn’t as finely tuned for different reasoning system instructions.

BlueNipples

8 days ago

•

edited 8 days ago

Ah, well that makes sense. Yeah, I started the chat with a decently long reasoning section from deepseek, and it was still really short. Triggering it no issue at all. Triggers often on it's own. Like yeah, after I prompted it once, it just did it again and again, so no issues triggering.

I use long reasoning sections to plot out complex action scenes etc, so longer is needed in my use case. But if it's a style choice, no biggie :)