• Developed by: Madeleine Hueber
  • Language(s) (NLP): English
  • License: For academic use only
  • Finetuned from model: Qwen3-0.6B-Base

This model is a preference-aligned language model fine-tuned for answering STEM-related instruction prompts. It was developed as part of the M2 deliverable for the CS-552 course Modern Natural Language Processing.

Training Details:

  • Stage 1: Instruction tuning on a subset of TIGER-Lab/WebInstructSub (200k data , aivalable on the train_instruct split of madhueb/MNLP_M2_dpo_dataset )

  • Stage 2: DPO fine-tuning using the train split of madhueb/MNLP_M2_dpo_dataset.

Downloads last month
2
Safetensors
Model size
596M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train madhueb/MNLP_M2_dpo_model