Tiny Reasoning Language Model
Collection
Collection dedicated to the development of the Tiny Reasoning Language Model (trlm)
β’
7 items
β’
Updated
β’
4
trlm-stage-1-sft-final-2
is the Stage 1 post-training model for the Tiny Reasoning Language Model (trlm) project.
This stage focuses on everyday conversations and general instruction following, fine-tuned on a curated dataset of 58,000 entries.
This stage teaches the model to follow instructions, rewrite, summarize, and hold conversations without reasoning tokens.
This model was trained on the dataset:
π Shekswess/trlm-sft-stage-1-final
Dataset summary:
Source Dataset | Entries | Percentage % |
---|---|---|
smoltalk_smollm3_smol_magpie_ultra_no_think | 33,500 | 57.8% |
smoltalk_smollm3_smol_summarize_no_think | 7,500 | 12.9% |
smoltalk_smollm3_smol_rewrite_no_think | 7,500 | 12.9% |
smoltalk_smollm3_systemchats_30k_no_think | 2,500 | 4.3% |
smoltalk_smollm3_explore_instruct_rewriting_no_think | 2,500 | 4.3% |
tulu_3_sft_personas_instruction_following_no_think | 2,500 | 4.3% |
smoltalk_smollm3_everyday_conversations_no_think | 2,000 | 3.4% |
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "Shekswess/trlm-stage-1-sft-final-2"
# Load tokenizer & model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Example inference
inputs = tokenizer("Write a short daily affirmation:", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Part of the Tiny Reasoning Language Model (trlm) post-training pipeline.