youssefbelghmi commited on
Commit
f221c5c
·
verified ·
1 Parent(s): b103539

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -23
README.md CHANGED
@@ -1,37 +1,100 @@
1
  ---
2
- base_model: Qwen/Qwen3-0.6B-Base
3
- datasets: youssefbelghmi/MNLP_M3_mcqa_dataset_support
 
 
4
  library_name: transformers
5
- model_name: MNLP_M3_mcqa_model_support
6
  tags:
7
- - generated_from_trainer
8
- - trl
9
- - sft
10
- licence: license
 
 
 
 
11
  ---
12
 
13
- # Model Card for MNLP_M3_mcqa_model_support
14
 
15
- This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) on the [youssefbelghmi/MNLP_M3_mcqa_dataset_support](https://huggingface.co/datasets/youssefbelghmi/MNLP_M3_mcqa_dataset_support) dataset.
16
- It has been trained using [TRL](https://github.com/huggingface/trl).
17
 
18
- ## Quick start
19
 
20
- ```python
21
- from transformers import pipeline
22
 
23
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
24
- generator = pipeline("text-generation", model="youssefbelghmi/MNLP_M3_mcqa_model_support", device="cuda")
25
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
26
- print(output["generated_text"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ```
28
 
29
- ## Training procedure
30
 
31
-
32
 
 
 
 
 
 
 
 
 
 
 
33
 
34
- This model was trained with SFT.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  ### Framework versions
37
 
@@ -43,8 +106,6 @@ This model was trained with SFT.
43
 
44
  ## Citations
45
 
46
-
47
-
48
  Cite TRL as:
49
 
50
  ```bibtex
@@ -56,4 +117,9 @@ Cite TRL as:
56
  publisher = {GitHub},
57
  howpublished = {\url{https://github.com/huggingface/trl}}
58
  }
59
- ```
 
 
 
 
 
 
1
  ---
2
+ license: mit
3
+ language: en
4
+ datasets:
5
+ - youssefbelghmi/MNLP_M3_mcqa_dataset
6
  library_name: transformers
7
+ pipeline_tag: text-classification
8
  tags:
9
+ - mcqa
10
+ - multiple-choice
11
+ - qwen
12
+ - qwen3
13
+ - supervised-fine-tuning
14
+ - mnlp
15
+ - epfl
16
+ - stem
17
  ---
18
 
19
+ # MNLP M3 MCQA Model (Qwen3-0.6B fine-tuned)
20
 
21
+ This model is a fine-tuned version of **Qwen/Qwen3-0.6B-Base** on the [MNLP M3 MCQA dataset](https://huggingface.co/datasets/youssefbelghmi/MNLP_M3_mcqa_dataset), a large-scale collection of multiple-choice questions designed for evaluating and training models in **STEM** domains (science, math, engineering, medicine, etc.).
 
22
 
23
+ It was trained as part of the final milestone of the **CS-552: Modern NLP** course at EPFL (Spring 2025).
24
 
25
+ ## Task
 
26
 
27
+ **Multiple-Choice Question Answering (MCQA):** Given a question and four answer options (A–D), the model must complete the prompt with the correct option letter only (e.g., `A`, `B`, `C`, or `D`). It was trained with rationales during supervision but outputs only the letter during inference, making it compatible with evaluation frameworks such as LightEval.
28
+
29
+ ## Training Dataset
30
+
31
+ - **Dataset:** [`youssefbelghmi/MNLP_M3_mcqa_dataset`](https://huggingface.co/datasets/youssefbelghmi/MNLP_M3_mcqa_dataset).
32
+ - ~30,000 questions from SciQ, OpenBookQA, MathQA, ARC, and MedMCQA.
33
+ - Each sample includes in particular:
34
+ - question,
35
+ - four answer choices (A–D),
36
+ - the correct answer as a letter,
37
+ - a short explanation (`support`) to guide learning.
38
+
39
+ ## Training Setup
40
+
41
+ - **Base model:** `Qwen/Qwen3-0.6B-Base`.
42
+ - **Method:** Supervised Fine-Tuning (SFT) with `trl` and `SFTTrainer`.
43
+ - **Tokenizer:** AutoTokenizer (with `eos_token` used as padding).
44
+
45
+ ## Training Prompt Format
46
+
47
+ During fine-tuning, each training example is converted into a prompt-completion pair. The prompt includes both the question and an explanation to guide the model’s reasoning:
48
+
49
+ ```text
50
+ The following is a multiple-choice question (with answers) about knowledge and skills in advanced master's-level STEM fields.
51
+ You will be provided with an explanation to help you understand the correct answer.
52
+ Select the correct answer by replying with the option letter (A, B, C, or D) only.
53
+
54
+ Question: <question_text>
55
+ A. <option_A>
56
+ B. <option_B>
57
+ C. <option_C>
58
+ D. <option_D>
59
+ Explanation: <support_text>
60
+ Answer:
61
  ```
62
 
63
+ The completion is a single token: " A", " B", " C", or " D", corresponding to the correct answer.
64
 
65
+ ## Training hyperparameters
66
 
67
+ The following hyperparameters were used during training:
68
+ - learning_rate: 2e-5
69
+ - num_train_epochs: 1
70
+ - per_device_train_batch_size: 4
71
+ - per_device_eval_batch_size: 4
72
+ - gradient_accumulation_steps: 4
73
+ - gradient_checkpointing: true
74
+ - eval_strategy: steps
75
+ - eval_steps: 100
76
+ - logging_steps: 100
77
 
78
+ ## Training Results
79
+
80
+ | Epoch | Training Loss | Validation Loss |
81
+ |--------:|----------------:|------------------:|
82
+ | 0.08 | 0.3461 | 0.2748 |
83
+ | 0.15 | 0.2938 | 0.2661 |
84
+ | 0.23 | 0.2881 | 0.26 |
85
+ | 0.31 | 0.2741 | 0.2666 |
86
+ | 0.38 | 0.2684 | 0.257 |
87
+ | 0.46 | 0.2635 | 0.2539 |
88
+ | 0.54 | 0.2603 | 0.2457 |
89
+ | 0.61 | 0.2555 | 0.2441 |
90
+ | 0.69 | 0.2459 | 0.2414 |
91
+ | 0.77 | 0.2383 | 0.2353 |
92
+ | 0.84 | 0.2266 | 0.2338 |
93
+ | 0.92 | 0.2112 | 0.2337 |
94
+ | 0.99 | 0.211 | 0.2335 |
95
+
96
+ - **Final training loss:** 0.211
97
+ - **Final validation accuracy:** ~92.0%
98
 
99
  ### Framework versions
100
 
 
106
 
107
  ## Citations
108
 
 
 
109
  Cite TRL as:
110
 
111
  ```bibtex
 
117
  publisher = {GitHub},
118
  howpublished = {\url{https://github.com/huggingface/trl}}
119
  }
120
+ ```
121
+
122
+ ## Author
123
+
124
+ Developed by [**Youssef Belghmi**](https://huggingface.co/youssefbelghmi)
125
+ CS-552: Modern NLP – EPFL, Spring 2025