Qwen3-0.6B-instruction-finetuned-MCQA

This model is a fine-tuned version of andresnowak/Qwen3-0.6B-instruction-finetuned on an unknown dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

This model was trained with the same methodology as https://huggingface.co/andresnowak/MNLP_M2_mcqa_model, where we only do a feedforward on the prompt we get the last logit token and we do cross entropy loss on that token and the 4 options of the question (so the idea is that we want to maximize the likelihood of the model of printing the correct letter to the question)

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 32
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2

Training results

The model was evaluated on a suite of Multiple Choice Question Answering (MCQA) benchmarks (on its validation and test sets repsectively for each one), and NLP4education is only the approximated 1000 question and answers given to use.

The performance on the MCQA benchmarks is:

Benchmark	Accuracy (Acc)	Normalized Accuracy (Acc Norm)
ARC Challenge	61.39%	59.96%
ARC Easy	79.43%	76.51%
GPQA	32.59%	28.57%
Math QA	24.69%	24.80%
MCQA Evals	41.82%	39.22%
MMLU	52.11%	52.11%
MMLU Pro	15.41%	14.31%
MuSR	51.06%	48.41%
NLP4Education	44.14%	42.73%
Overall	44.74%	42.96%

The tests where done with this prompt (And only MusR used a different one where you add the Question: and Narrative: )

This question assesses challenging STEM problems as found on graduate standardized tests. Carefully evaluate the options and select the correct answer.

---
[Insert Question Here]
---
[Insert Choices Here, e.g.:
A. Option 1
B. Option 2
C. Option 3
D. Option 4]
---

Your response should include the letter and the exact text of the correct choice.
Example: B. Entropy increases.
Answer:

And the teseting was done on [Letter]. [Text answer]

Framework versions

Transformers 4.51.3
Pytorch 2.5.1+cu121
Datasets 3.6.0
Tokenizers 0.21.0

andresnowak
/

Qwen3-0.6B-instruction-finetuned-MCQA

Qwen3-0.6B-instruction-finetuned-MCQA

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for andresnowak/Qwen3-0.6B-instruction-finetuned-MCQA

Datasets used to train andresnowak/Qwen3-0.6B-instruction-finetuned-MCQA

Collections including andresnowak/Qwen3-0.6B-instruction-finetuned-MCQA

MNLP

MNLP: SFT for MCQA

Evaluation results