Codex-24B-Small-3.2
Note: This model does not include vision. It is text-only.
Not counting my AI Dungeon collaboration, it's been a while since I did another personal release that wasn't Pantheon, but here we are! You can consider Codex a research-oriented roleplay experiment in which I've tried to induce as much synthetic diversity as possible. Gone are the typical "Charname/he/she does this" responses and welcome are, well, anything else! You have to try to understand, really.
In the datasets themselves are countless other breakthroughs and improvements, but I'd say the most important one is embracing the full human spectrum of diverse storytelling. No matter whether it's wholesome or dark, this model will not judge, and it intends to deliver. (Or tries to, anyway!)
GGUF quants are available here, and EXL3 quants can be found here.
Your user feedback is critical to me so don't hesitate to tell me whether my model is either 1. terrible, 2. awesome or 3. somewhere in-between.
Model details
Considering Small 3.2 boasts about repetition reduction, I figured this was the time to train it on the very work I've been focusing on - systematic pattern diversity!
This finetune combines approximately 39 million tokens of carefully curated data:
- GPT 4.1 Instruct core for clean instruction following
- DeepSeek V3/R1 roleplay data
- Curated "best of" Pantheon interactions
- Diverse text adventure compilations
Each dataset component was specifically validated for structural variance - rarely starting responses the same way, featuring diverse sentence patterns and 10-40 turn conversations. This builds on months of diversity optimization research aimed at breaking common AI response patterns. It's been...quite a journey.
About half of the roleplay dataset is in Markdown asterisk format, but the majority of the other data is written in a narrative (book-style) present tense, second person perspective format.
Inference
Mistral really loves recommending unusual inference settings but I've been getting decent results with the settings below:
"temperature": 0.5,
"repetition_penalty": 1.05, # Or use the DRY sampler!
"min_p": 0.05
Yes, the temperature is correct. This model creates diversity at the training level, so any additional increase will simply cost you coherence instead.
Having character names in front of messages is not a requirement but remains a personal recommendation of mine - it seems to help the model focus more on the character(s) in question. World-focused text adventures do fine without it.
Prompt Format
The model was trained using ChatML.
<|im_start|>system
SYSTEM MESSAGE GOES HERE<|im_end|>
<|im_start|>user
USER MESSAGE GOES HERE<|im_end|>
<|im_start|>assistant
Character:
Credits
- Everyone from Anthracite! Hi, guys!
- Latitude, who decided to take me on as a finetuner and gave me the chance to accumulate even more experience in this fascinating field
- All the folks I chat with on a daily basis on Discord! You know who you are.
- Anyone I forgot to mention, just in case!
- Downloads last month
- 98
Model tree for Gryphe/Codex-24B-Small-3.2
Base model
mistralai/Mistral-Small-3.1-24B-Base-2503