metadata
library_name: transformers
license: apache-2.0
base_model:
- nbeerbower/mistral-nemo-kartoffel-12B
datasets:
- nbeerbower/Schule-DPO
- nbeerbower/Purpura-DPO
- nbeerbower/Arkhaios-DPO
- jondurbin/truthy-dpo-v0.1
- antiven0m/physical-reasoning-dpo
- Atsunori/HelpSteer2-DPO
- GeneralReasoning/GeneralThought-430K
- nvidia/OpenMathReasoning
- nvidia/OpenCodeReasoning
tags:
- orpo
- uncensored
- reasoning
- chain-of-thought
- qlora
- experimental
🧪 Experimental Model
This is one of many experimental iterations I'm sharing publicly while I mess around with training parameters and ideas. It's not a "real" release - just me being transparent about my learning process. Feel free to look under the hood, but don't expect anything production-ready!
Denker-mistral-nemo-12B
Denker is a small, uncensored, reasoning-focused model finetuned using ORPO and QLoRA on top of mistral-nemo-kartoffel-12B.
This run experiments with the Qwen-style chat template and <think>...</think>
-style reasoning structure—without modifying the base vocab. All tuning was done via LoRA.
Finetuning Details
- Method: ORPO
- Epochs: 0.25
- Learning Rate: 8e-6, cosine decay w/ 5% warmup
- Batch Size: 1 x 64 (64 effective)
- Max Grad Norm: 0.5
- LoRA Rank: 128
- Hardware: 1x NVIDIA RTX A6000
Dataset Composition
Thinking disabled:
- nbeerbower/Schule-DPO
- nbeerbower/Purpura-DPO
- nbeerbower/Arkhaios-DPO
- jondurbin/truthy-dpo-v0.1
- antiven0m/physical-reasoning-dpo
- Atsunori/HelpSteer2-DPO
Chain of Thought
30,000 samples of each dataset with thinking enabled.
Results
Observations
The model will sometimes decide not to think.
Evals
TBD