🧪 Experimental Model

This is one of many experimental iterations I'm sharing publicly while I mess around with training parameters and ideas. It's not a "real" release - just me being transparent about my learning process. Feel free to look under the hood, but don't expect anything production-ready!

image/png

Denker-mistral-nemo-12B

Denker is a small, uncensored, reasoning-focused model finetuned using ORPO and QLoRA on top of mistral-nemo-kartoffel-12B.

This run experiments with the Qwen-style chat template and <think>...</think>-style reasoning structure—without modifying the base vocab. All tuning was done via LoRA.

Finetuning Details

  • Method: ORPO
  • Epochs: 0.25
  • Learning Rate: 8e-6, cosine decay w/ 5% warmup
  • Batch Size: 1 x 64 (64 effective)
  • Max Grad Norm: 0.5
  • LoRA Rank: 128
  • Hardware: 1x NVIDIA RTX A6000

Dataset Composition

Thinking disabled:

Chain of Thought

30,000 samples of each dataset with thinking enabled.

Results

Observations

The model will sometimes decide not to think.

Evals

TBD

Downloads last month
0
Safetensors
Model size
12.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nbeerbower/Denker-mistral-nemo-12B

Datasets used to train nbeerbower/Denker-mistral-nemo-12B