metadata

library_name: transformers
license: apache-2.0
base_model:
  - nbeerbower/mistral-nemo-kartoffel-12B
datasets:
  - nbeerbower/Schule-DPO
  - nbeerbower/Purpura-DPO
  - nbeerbower/Arkhaios-DPO
  - jondurbin/truthy-dpo-v0.1
  - antiven0m/physical-reasoning-dpo
  - Atsunori/HelpSteer2-DPO
  - GeneralReasoning/GeneralThought-430K
  - nvidia/OpenMathReasoning
  - nvidia/OpenCodeReasoning
tags:
  - orpo
  - uncensored
  - reasoning
  - chain-of-thought
  - qlora
  - experimental

🧪 Experimental Model

This is one of many experimental iterations I'm sharing publicly while I mess around with training parameters and ideas. It's not a "real" release - just me being transparent about my learning process. Feel free to look under the hood, but don't expect anything production-ready!

Denker-mistral-nemo-12B

Denker is a small, uncensored, reasoning-focused model finetuned using ORPO and QLoRA on top of mistral-nemo-kartoffel-12B.

This run experiments with the Qwen-style chat template and <think>...</think>-style reasoning structure—without modifying the base vocab. All tuning was done via LoRA.

Finetuning Details

Method: ORPO
Epochs: 0.25
Learning Rate: 8e-6, cosine decay w/ 5% warmup
Batch Size: 1 x 64 (64 effective)
Max Grad Norm: 0.5
LoRA Rank: 128
Hardware: 1x NVIDIA RTX A6000

Dataset Composition

Thinking disabled:

Chain of Thought

30,000 samples of each dataset with thinking enabled.

Results

Observations

The model will sometimes decide not to think.

Evals

TBD