🧪 Experimental Model

This is one of many experimental iterations I'm sharing publicly while I mess around with training parameters and ideas. It's not a "real" release - just me being transparent about my learning process. Feel free to look under the hood, but don't expect anything production-ready!

Denker-mistral-nemo-12B

Denker is a small, uncensored, reasoning-focused model finetuned using ORPO and QLoRA on top of mistral-nemo-kartoffel-12B.

This run experiments with the Qwen-style chat template and <think>...</think>-style reasoning structure—without modifying the base vocab. All tuning was done via LoRA.

Finetuning Details

Method: ORPO
Epochs: 0.25
Learning Rate: 8e-6, cosine decay w/ 5% warmup
Batch Size: 1 x 64 (64 effective)
Max Grad Norm: 0.5
LoRA Rank: 128
Hardware: 1x NVIDIA RTX A6000

Dataset Composition

Thinking disabled:

Chain of Thought

30,000 samples of each dataset with thinking enabled.

Results

Observations

The model will sometimes decide not to think.

Evals

TBD

nbeerbower
/

Denker-mistral-nemo-12B