cyberbabooshka commited on
Commit
b8d6361
·
verified ·
1 Parent(s): 1e0cf33

End of training

Browse files
Files changed (1) hide show
  1. README.md +201 -0
README.md ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen3-0.6B-Base
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ datasets:
9
+ - cyberbabooshka/MNLP_M2_mcqa_dataset
10
+ model-index:
11
+ - name: base_noreasoning
12
+ results: []
13
+ ---
14
+
15
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
+ should probably proofread and complete it, then remove this comment. -->
17
+
18
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
+ <details><summary>See axolotl config</summary>
20
+
21
+ axolotl version: `0.10.0.dev0`
22
+ ```yaml
23
+ base_model: Qwen/Qwen3-0.6B-Base
24
+ hub_model_id: cyberbabooshka/base_noreasoning
25
+ wandb_name: base
26
+
27
+ tokenizer_type: AutoTokenizer
28
+ load_in_8bit: false
29
+ load_in_4bit: false
30
+
31
+ num_processes: 64
32
+ dataset_processes: 64
33
+ dataset_prepared_path: last_run_prepared
34
+
35
+ chat_template: jinja
36
+ chat_template_jinja: >-
37
+ {%- for message in messages %}
38
+ {{- '<|im_start|>' + message.role + '\n' + message.content.lstrip('\n') + '<|im_end|>' + '\n' }}
39
+ {%- endfor %}
40
+ {%- if add_generation_prompt %}
41
+ {{- '<|im_start|>assistant\n' }}
42
+ {%- endif %}
43
+
44
+ datasets:
45
+ - path: cyberbabooshka/MNLP_M2_mcqa_dataset
46
+ split: train
47
+ type: chat_template
48
+ field_messages: messages
49
+ train_on_eos: turn
50
+ train_on_eot: turn
51
+ message_property_mappings:
52
+ role: role
53
+ content: content
54
+ roles:
55
+ user:
56
+ - user
57
+ assistant:
58
+ - assistant
59
+
60
+ test_datasets:
61
+ - path: cyberbabooshka/MNLP_M2_mcqa_dataset
62
+ split: test
63
+ type: chat_template
64
+ field_messages: messages
65
+ train_on_eos: turn
66
+ train_on_eot: turn
67
+ message_property_mappings:
68
+ role: role
69
+ content: content
70
+ roles:
71
+ user:
72
+ - user
73
+ assistant:
74
+ - assistant
75
+
76
+ output_dir: ./outputs
77
+
78
+ sequence_len: 2048
79
+ batch_flattening: true
80
+ sample_packing: false
81
+
82
+ wandb_project: mnlp
83
+ wandb_entity: aleksandr-dremov-epfl
84
+ wandb_watch:
85
+ wandb_log_model:
86
+
87
+ gradient_accumulation_steps: 1
88
+ eval_batch_size: 16
89
+ micro_batch_size: 12
90
+
91
+ optimizer: ademamix_8bit
92
+ weight_decay: 0.01
93
+
94
+ learning_rate: 0.00001
95
+ warmup_steps: 500
96
+
97
+ wsd_final_lr_factor: 0.0
98
+ wsd_init_div_factor: 100
99
+ wsd_fract_decay: 0.2
100
+ wsd_decay_type: "sqrt"
101
+ wsd_sqrt_power: 0.5
102
+ wsd_cooldown_start_lr_factor: 1.0
103
+
104
+ bf16: auto
105
+ tf32: false
106
+
107
+ torch_compile: true
108
+ flash_attention: true
109
+ gradient_checkpointing: false
110
+
111
+ resume_from_checkpoint:
112
+ auto_resume_from_checkpoints: true
113
+
114
+ logging_steps: 16
115
+ eval_steps: 2000
116
+ save_steps: 1000
117
+ max_steps: 35000
118
+ num_epochs: 20000000
119
+ save_total_limit: 2
120
+
121
+ special_tokens:
122
+ eos_token: "<|im_end|>"
123
+ pad_token: "<|endoftext|>"
124
+
125
+ eot_tokens:
126
+ - <|im_end|>
127
+
128
+ plugins:
129
+ - axolotl_wsd.WSDSchedulerPlugin
130
+
131
+ ```
132
+
133
+ </details><br>
134
+
135
+ # base_noreasoning
136
+
137
+ This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) on the cyberbabooshka/MNLP_M2_mcqa_dataset dataset.
138
+ It achieves the following results on the evaluation set:
139
+ - Loss: 0.7964
140
+
141
+ ## Model description
142
+
143
+ More information needed
144
+
145
+ ## Intended uses & limitations
146
+
147
+ More information needed
148
+
149
+ ## Training and evaluation data
150
+
151
+ More information needed
152
+
153
+ ## Training procedure
154
+
155
+ ### Training hyperparameters
156
+
157
+ The following hyperparameters were used during training:
158
+ - learning_rate: 1e-05
159
+ - train_batch_size: 12
160
+ - eval_batch_size: 16
161
+ - seed: 42
162
+ - distributed_type: multi-GPU
163
+ - num_devices: 2
164
+ - total_train_batch_size: 24
165
+ - total_eval_batch_size: 32
166
+ - optimizer: Use OptimizerNames.ADEMAMIX_8BIT and the args are:
167
+ No additional optimizer arguments
168
+ - lr_scheduler_type: cosine
169
+ - lr_scheduler_warmup_steps: 500
170
+ - training_steps: 35000
171
+
172
+ ### Training results
173
+
174
+ | Training Loss | Epoch | Step | Validation Loss |
175
+ |:-------------:|:------:|:-----:|:---------------:|
176
+ | No log | 0.0000 | 1 | 0.9810 |
177
+ | 0.8508 | 0.0556 | 2000 | 0.8516 |
178
+ | 0.8877 | 0.1111 | 4000 | 0.8365 |
179
+ | 0.8851 | 0.1667 | 6000 | 0.8281 |
180
+ | 0.8193 | 0.2223 | 8000 | 0.8222 |
181
+ | 0.8298 | 0.2778 | 10000 | 0.8177 |
182
+ | 0.8439 | 0.3334 | 12000 | 0.8141 |
183
+ | 0.8364 | 0.3890 | 14000 | 0.8111 |
184
+ | 0.8015 | 0.4445 | 16000 | 0.8085 |
185
+ | 0.8112 | 0.5001 | 18000 | 0.8062 |
186
+ | 0.7972 | 0.5556 | 20000 | 0.8042 |
187
+ | 0.8264 | 0.6112 | 22000 | 0.8024 |
188
+ | 0.7728 | 0.6668 | 24000 | 0.8008 |
189
+ | 0.7762 | 0.7223 | 26000 | 0.7992 |
190
+ | 0.8185 | 0.7779 | 28000 | 0.7978 |
191
+ | 0.8235 | 0.8335 | 30000 | 0.7967 |
192
+ | 0.7812 | 0.8890 | 32000 | 0.7964 |
193
+ | 0.7872 | 0.9446 | 34000 | 0.7964 |
194
+
195
+
196
+ ### Framework versions
197
+
198
+ - Transformers 4.52.1
199
+ - Pytorch 2.7.0+cu126
200
+ - Datasets 3.5.0
201
+ - Tokenizers 0.21.1