Upload 12 files

Browse files

Files changed (12) hide show

README.md +115 -3
added_tokens.json +4 -0
chat_template.jinja +9 -0
config.json +38 -0
generation_config.json +9 -0
merges.txt +0 -0
model.safetensors +3 -0
special_tokens_map.json +34 -0
tokenizer.json +0 -0
tokenizer_config.json +170 -0
training_args.bin +3 -0
vocab.json +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,115 @@
----
-license: apache-2.0
----

+---
+library_name: transformers
+license: apache-2.0
+base_model: Shekswess/trlm-stage-2-sft-final-2
+tags:
+- trl
+- dpo
+- preference-alignment
+- reasoning
+- generated_from_trainer
+model-index:
+- name: trlm-stage-3-dpo-final-2
+  results: []
+---
+<p align="center">
+  <img src="https://sdmntprnortheu.oaiusercontent.com/files/00000000-f580-61f4-9d8f-e2ad1ad30cb1/raw?se=2025-09-28T13%3A44%3A27Z&sp=r&sv=2024-08-04&sr=b&scid=d18de0ac-b41e-5d89-82aa-2a8c74df25d6&skoid=f28c0102-4d9d-4950-baf0-4a8e5f6cf9d4&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2025-09-27T15%3A59%3A48Z&ske=2025-09-28T15%3A59%3A48Z&sks=b&skv=2024-08-04&sig=CSrmTwUK5za43FjSFhOlkzGlLkqG2CDPpKYkYtSdV6g%3D" alt="TRLm Stage 3 Banner" width="800"/>
+</p>
+# 🧠 trlm-stage-3-dpo-final-2
+`trlm-stage-3-dpo-final-2` is the **Stage 3** post-training model for the **Tiny Reasoning Language Model (trlm)** project.
+This stage focuses on **preference alignment** using **Direct Preference Optimization (DPO)** with 50k preference pairs.
+---
+## 📖 Model Description
+- **Base Model**: [Shekswess/trlm-stage-2-sft-final-2](https://huggingface.co/Shekswess/trlm-stage-2-sft-final-2)
+- **Type**: Causal Language Model (decoder-only transformer)
+- **Stage**: Post-training **Stage 3 (DPO)**
+- **Objective**: Align model outputs with human-preferred reasoning and answers by contrasting **chosen** vs **rejected** completions.
+This stage improves the model’s **alignment**, **coherence**, and **reasoning stability**.
+---
+## 🎯 Intended Uses & Limitations
+### Intended Uses
+- Aligned reasoning assistant with structured `<think>` traces
+- Multi-turn reasoning with preference-optimized outputs
+- Safer, more useful responses for reasoning tasks
+### Limitations
+- Trained only on preference data → may inherit biases from source datasets
+- Limited parameter count (135M) restricts knowledge breadth
+- Still prone to hallucinations under complex reasoning chains
+---
+## 📊 Training Data
+This model was trained on the dataset:
+👉 [**Shekswess/trlm-dpo-stage-3-final-2**](https://huggingface.co/datasets/Shekswess/trlm-dpo-stage-3-final-2)
+**Dataset summary**:
+- **Entries**: 50,000 preference pairs
+- **Source**: `scottgeng00/olmo-3-preference-mix-deltas_reasoning-yolo_scottmix-DECON-chfiltered`
+- **Focus**: Preference alignment with **chosen vs rejected responses**
+| Source Dataset | Split | Entries | % |
+|----------------|-------|---------|---|
+| scottgeng00/olmo-3-preference-mix-deltas_reasoning-yolo_scottmix-DECON-chfiltered | train | 50,000 | 100% |
+---
+## ⚙️ Training Procedure
+### Training Hyperparameters
+- **Learning rate**: 1e-5
+- **Train batch size**: 32
+- **Eval batch size**: 8
+- **Gradient accumulation steps**: 4
+- **Total effective batch size**: 128
+- **Optimizer**: AdamW (betas=(0.9, 0.999), eps=1e-08)
+- **LR Scheduler**: Cosine with minimum LR + warmup ratio 0.1
+- **Epochs**: 1
+- **Seed**: 42
+### Framework Versions
+- **Transformers**: 4.56.2
+- **PyTorch**: 2.7.1+rocm7.0.0.git698b58a9
+- **Datasets**: 4.0.0
+- **Tokenizers**: 0.22.1
+---
+## 🚀 Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_name = "Shekswess/trlm-stage-3-dpo-final-2"
+# Load tokenizer & model
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+# Example inference with preference-aligned reasoning
+messages = [
+    {"role": "user", "content": "Explain why the sky is blue in simple terms."}
+]
+# Apply chat template
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer([text], return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=256)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+---
+Part of the Tiny Reasoning Language Model (trlm) post-training pipeline.

added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "</think>": 49153,
+  "<think>": 49152
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,9 @@

+{% for message in messages %}
+  {% if loop.first and messages[0]['role'] != 'system' %}
+      {{ '<|im_start|>system\nYou are a helpful AI assistant named Tiny Reasoning Language Model, trained by Shekswess. You are an assistant, with the ability to do reasoning. When performing reasoning always perform your full chain of thought inside <think>...</think> before giving a final answer. You are always reasoning so always use <think> </think> tags.<|im_end|>\n' }}
+  {% endif %}
+  {{ '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>\n' }}
+{% endfor %}
+{% if add_generation_prompt %}
+    {{ '<|im_start|>assistant\n' }}
+{% endif %}

config.json ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "dtype": "bfloat16",
+  "eos_token_id": 2,
+  "head_dim": 64,
+  "hidden_act": "silu",
+  "hidden_size": 576,
+  "initializer_range": 0.041666666666666664,
+  "intermediate_size": 1536,
+  "is_llama_config": true,
+  "max_position_embeddings": 8192,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 9,
+  "num_hidden_layers": 30,
+  "num_key_value_heads": 3,
+  "pad_token_id": 2,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_interleaved": false,
+  "rope_scaling": null,
+  "rope_theta": 100000,
+  "tie_word_embeddings": true,
+  "transformers.js_config": {
+    "kv_cache_dtype": {
+      "fp16": "float16",
+      "q4f16": "float16"
+    }
+  },
+  "transformers_version": "4.56.2",
+  "use_cache": true,
+  "vocab_size": 49154
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": [
+    2
+  ],
+  "pad_token_id": 2,
+  "transformers_version": "4.56.2"
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f72be34d84f9d95d2f1ba9e6aa5d352ec841ce1e7aed81998135ce5bf96e9a09
+size 269062856

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "additional_special_tokens": [
+    "<think>",
+    "</think>"
+  ],
+  "bos_token": {
+    "content": "<|im_start|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,170 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<repo_name>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<reponame>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "5": {
+      "content": "<file_sep>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "6": {
+      "content": "<filename>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "7": {
+      "content": "<gh_stars>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "8": {
+      "content": "<issue_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "9": {
+      "content": "<issue_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "10": {
+      "content": "<issue_closed>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "11": {
+      "content": "<jupyter_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "12": {
+      "content": "<jupyter_text>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "13": {
+      "content": "<jupyter_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "14": {
+      "content": "<jupyter_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "15": {
+      "content": "<jupyter_script>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "16": {
+      "content": "<empty_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49152": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49153": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<think>",
+    "</think>"
+  ],
+  "bos_token": "<|im_start|>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "extra_special_tokens": {},
+  "model_max_length": 8192,
+  "pad_token": "<|im_end|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>",
+  "vocab_size": 49152
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:247b792c868f8245ceddd15c2b2486a99317401202045e562ed34e945a36ed82
+size 6865

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff