Introduction

This model is my feeble attempt at reproducing the R1 at home experience, following the (un)official Deepseek formula:
Deepseek V3 + Unhinged Reasoning = R1
Substituting out some variables we get:
Arcee Blitz V3 Distill + Unhinged R1 Reasoning Traces = MistralSmallV3R

I'm quite happy with how this model turned out; It's definitely weaker than QwQ for coding and long reasoning problems but a lot more personable and well rounded overall. The faster speed and VRAM savings are quite nice as well for how long these models can reason and how much context they can burn through. Stability is satisfactory too and the model appears to function well even without super aggressive sampling like the base instruct finetune.

Use Cases

Just like the base model, MistralSmallV3R is a solid all rounder and should be able to effectively generalize its reasoning capabilities, as care was taken to train this model on a wide variety of reasoning tasks, including math, coding, roleplay, etc. In particular, MistralSmallV3R appears very good at contextual and emotional reasoning compared to other reasoning models that I have tested so far. MistralSmallV3R also tends to spend good portion of its thought considering what the user wants, hopefully giving this model higher resistance to poor prompting. Compared to other mid-range reasoning models such as QwQ it also has markedly better prose quality without sounding like too much of a try hard like some of the Gutenberg models. This model has also inherited a notable negative bias from the R1 synthetic data it was trained on.
Supports up to 32k context.
Should remain usable with only 12GB of VRAM when quanted to IQ3_M or IQ3_S.

Recommended Settings:

Template: Mistral Tekken V7
Temp: 0.1-0.7, depending on use case.
TopP: 0.95
MinP: 0.05
Rep Pen: Not needed, but if you do use it keep the range low!
Do not use DRY!
System prompt: Trained on Sillytavern defaults. Add Reason out how you will respond between the <think> and </think> tags. to the end of your system prompt. Optionally you may wish to prompt the model to have positive bias.
Add the <think> tag to the beginning of the response.
Enable reasoning auto-parse, and remove the newlines from the reasoning formatting prefix and suffix.
Low temps allow for longer reasoning
Q4_K_M is reccomended as the smallest effectively lossless quant.

Benchmarks

TODO

Dataset

Was trained on an interleaved dataset of 40% LimaRP-R1 and 60% Openthoughts.

Thanks

Major thanks to Arcee AI for their excellent Arcee Blitz finetune and general mergekit nonsense.
Undi95 for proving the viability of a stable reasoning finetune of MS3.
OpenThoughts for their dataset of the same name.
Mistral, for their continuing dedication to the open source community.

ConicCat
/

MistralSmallV3R