metadata
license: apache-2.0
base_model: Qwen/Qwen3-0.6B-Base
tags:
- merge
- sft
- dpo
- qwen3
- math
- code
- mcqa
- mnlp-m3
datasets:
- albertfares/MNLP_M3_dpo_dataset
language:
- en
pipeline_tag: text-generation
MNLP M3 Merged Model (SFT + DPO)
This model combines the best of both worlds:
- SFT Component:
mgatti/MNLP_M3_mcqa_model
- Multiple-choice QA capabilities - DPO Component:
albertfares/MNLP_M3_dpo_model
- Preference-aligned responses
Model Details
- Base Model: Qwen/Qwen3-0.6B-Base
- SFT Model: Multiple-choice QA fine-tuned model
- DPO Model: Direct preference optimized model
- Merge Strategy: Advanced model weight merging
- Combined Capabilities: MCQA + preference alignment
Capabilities
✅ Multiple-Choice Question Answering (from SFT component)
✅ Preference-Aligned Generation (from DPO component)
✅ Math and Code Generation (from MNLP M3 training)
✅ Reasoning Tasks (combined strengths)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("merged_mnlp_m3_sft_dpo")
tokenizer = AutoTokenizer.from_pretrained("merged_mnlp_m3_sft_dpo")
# For MCQA
prompt = "Which of the following is correct? A) 2+2=5 B) 2+2=4 C) 2+2=3"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# For general generation
prompt = "Explain the concept of recursion in programming"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=300, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Data
- SFT: Multiple-choice QA dataset
- DPO: MNLP M3 preference dataset with math, code, and reasoning
This merged model should excel at both structured QA tasks and open-ended generation with preference alignment.