Developed by: Madeleine Hueber
Language(s) (NLP): English
License: For academic use only
Finetuned from model: Qwen3-0.6B-Base

This model is a preference-aligned language model fine-tuned for answering STEM-related instruction prompts. It was developed as part of the CS-552 course Modern Natural Language Processing at EPFL.

Training Details:

Stage 1: Instruction tuning on TIGER-Lab/WebInstruct-verified (200k data , aivalable on the instruction split of madhueb/MNLP_M3_dpo_dataset )
Stage 2: DPO fine-tuning using the default train split of madhueb/MNLP_M3_dpo_dataset.

madhueb
/

MNLP_M3_dpo_model

Training Details:

Dataset used to train madhueb/MNLP_M3_dpo_model