🧪 Experimental Model
This is one of many experimental iterations I'm sharing publicly while I mess around with training parameters and ideas. It's not a "real" release - just me being transparent about my learning process. Feel free to look under the hood, but don't expect anything production-ready!
Denker-mistral-nemo-12B
Denker is a small, uncensored, reasoning-focused model finetuned using ORPO and QLoRA on top of mistral-nemo-kartoffel-12B.
This run experiments with the Qwen-style chat template and <think>...</think>
-style reasoning structure—without modifying the base vocab. All tuning was done via LoRA.
Finetuning Details
- Method: ORPO
- Epochs: 0.25
- Learning Rate: 8e-6, cosine decay w/ 5% warmup
- Batch Size: 1 x 64 (64 effective)
- Max Grad Norm: 0.5
- LoRA Rank: 128
- Hardware: 1x NVIDIA RTX A6000
Dataset Composition
Thinking disabled:
- nbeerbower/Schule-DPO
- nbeerbower/Purpura-DPO
- nbeerbower/Arkhaios-DPO
- jondurbin/truthy-dpo-v0.1
- antiven0m/physical-reasoning-dpo
- Atsunori/HelpSteer2-DPO
Chain of Thought
30,000 samples of each dataset with thinking enabled.
Results
Observations
The model will sometimes decide not to think.
Evals
TBD
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for nbeerbower/Denker-mistral-nemo-12B
Finetuned
nbeerbower/mistral-nemo-kartoffel-12B