Alignment-Lab-AI commited on
Commit
be5b9e2
·
verified ·
1 Parent(s): fef6e93

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +180 -0
README.md ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: mistralai/Mistral-Nemo-Instruct-2407
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ datasets:
9
+ - linabot/train_data
10
+ model-index:
11
+ - name: linabot
12
+ results: []
13
+ ---
14
+
15
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
+ should probably proofread and complete it, then remove this comment. -->
17
+
18
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
+ <details><summary>See axolotl config</summary>
20
+
21
+ axolotl version: `0.8.0`
22
+ ```yaml
23
+ base_model: mistralai/Mistral-Nemo-Instruct-2407
24
+ model_type: MistralForCausalLM
25
+ hub_model_id: Alignment-Lab-AI/linabot
26
+ strict: false
27
+ chat_template: tokenizer_default
28
+ plugins:
29
+ - axolotl.integrations.liger.LigerPlugin
30
+ liger_rope: true
31
+ liger_rms_norm: true
32
+ liger_glu_activation: true
33
+ liger_layer_norm: true
34
+ liger_fused_linear_cross_entropy: true
35
+ datasets:
36
+ - path: linabot/train_data
37
+ type: chat_template
38
+ field_messages: messages
39
+ message_property_mappings:
40
+ role: role
41
+ content: content
42
+ roles_to_train: ['assistant', 'user']
43
+ train_on_eos: turn
44
+
45
+ learning_rate: 2e-5
46
+ lr_scheduler: cosine
47
+ weight_decay: 0.03
48
+ warmup_steps: 450
49
+ dataset_prepared_path:
50
+ val_set_size: 0.2
51
+ output_dir: ./outputs/out
52
+
53
+ sequence_len: 10400
54
+ sample_packing: true
55
+ pad_to_sequence_len: true
56
+ eval_sample_packing: true
57
+
58
+ wandb_project: linabot
59
+ wandb_entity:
60
+ wandb_watch: all
61
+ wandb_name:
62
+ wandb_log_model:
63
+
64
+ gradient_accumulation_steps: 1
65
+ micro_batch_size: 4
66
+ num_epochs: 5
67
+ optimizer: adalomo
68
+ lr_scheduler: cosine
69
+ learning_rate: 0.0002024
70
+ flash_attention: true
71
+ flash_attn_cross_entropy: false
72
+ flash_attn_rms_norm: true
73
+ flash_attn_fuse_qkv: false
74
+ flash_attn_fuse_mlp: true
75
+ torch_compile_mode: "max-autotune"
76
+ bf16: auto
77
+ tf32: false
78
+
79
+ gradient_checkpointing: true
80
+ resume_from_checkpoint:
81
+ logging_steps: 1
82
+
83
+ evals_per_epoch: 8
84
+ saves_per_epoch: 1
85
+ weight_decay: 0.03
86
+ special_tokens:
87
+ bos_token: "<s>"
88
+ eos_token: "</s>"
89
+ pad_token: "<pad>"
90
+
91
+ ```
92
+
93
+ </details><br>
94
+
95
+ # linabot
96
+
97
+ This model is a fine-tuned version of [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) on the linabot/train_data dataset.
98
+ It achieves the following results on the evaluation set:
99
+ - Loss: 0.0558
100
+
101
+ ## Model description
102
+
103
+ More information needed
104
+
105
+ ## Intended uses & limitations
106
+
107
+ More information needed
108
+
109
+ ## Training and evaluation data
110
+
111
+ More information needed
112
+
113
+ ## Training procedure
114
+
115
+ ### Training hyperparameters
116
+
117
+ The following hyperparameters were used during training:
118
+ - learning_rate: 0.0002024
119
+ - train_batch_size: 4
120
+ - eval_batch_size: 4
121
+ - seed: 42
122
+ - optimizer: Use OptimizerNames.ADALOMO and the args are:
123
+ No additional optimizer arguments
124
+ - lr_scheduler_type: cosine
125
+ - lr_scheduler_warmup_steps: 450
126
+ - num_epochs: 5.0
127
+
128
+ ### Training results
129
+
130
+ | Training Loss | Epoch | Step | Validation Loss |
131
+ |:-------------:|:------:|:----:|:---------------:|
132
+ | 1.526 | 0.0083 | 1 | 1.5474 |
133
+ | 1.5934 | 0.125 | 15 | 1.5472 |
134
+ | 1.5242 | 0.25 | 30 | 1.5454 |
135
+ | 1.5296 | 0.375 | 45 | 1.5408 |
136
+ | 1.5087 | 0.5 | 60 | 1.5322 |
137
+ | 1.486 | 0.625 | 75 | 1.5188 |
138
+ | 1.4314 | 0.75 | 90 | 1.5005 |
139
+ | 1.4311 | 0.875 | 105 | 1.4782 |
140
+ | 1.4532 | 1.0 | 120 | 1.4513 |
141
+ | 1.4215 | 1.125 | 135 | 1.4198 |
142
+ | 1.3248 | 1.25 | 150 | 1.3825 |
143
+ | 1.2697 | 1.375 | 165 | 1.3386 |
144
+ | 1.3281 | 1.5 | 180 | 1.2880 |
145
+ | 1.2428 | 1.625 | 195 | 1.2296 |
146
+ | 1.1533 | 1.75 | 210 | 1.1596 |
147
+ | 1.1038 | 1.875 | 225 | 1.0747 |
148
+ | 1.0226 | 2.0 | 240 | 0.9723 |
149
+ | 0.8858 | 2.125 | 255 | 0.8467 |
150
+ | 0.6762 | 2.25 | 270 | 0.7047 |
151
+ | 0.6433 | 2.375 | 285 | 0.5626 |
152
+ | 0.4017 | 2.5 | 300 | 0.4283 |
153
+ | 0.2875 | 2.625 | 315 | 0.3072 |
154
+ | 0.2244 | 2.75 | 330 | 0.2161 |
155
+ | 0.1445 | 2.875 | 345 | 0.1572 |
156
+ | 0.0898 | 3.0 | 360 | 0.1192 |
157
+ | 0.0666 | 3.125 | 375 | 0.0991 |
158
+ | 0.0605 | 3.25 | 390 | 0.0855 |
159
+ | 0.0457 | 3.375 | 405 | 0.0757 |
160
+ | 0.052 | 3.5 | 420 | 0.0700 |
161
+ | 0.0634 | 3.625 | 435 | 0.0658 |
162
+ | 0.0364 | 3.75 | 450 | 0.0623 |
163
+ | 0.045 | 3.875 | 465 | 0.0601 |
164
+ | 0.0395 | 4.0 | 480 | 0.0582 |
165
+ | 0.0558 | 4.125 | 495 | 0.0573 |
166
+ | 0.0468 | 4.25 | 510 | 0.0566 |
167
+ | 0.0399 | 4.375 | 525 | 0.0562 |
168
+ | 0.0337 | 4.5 | 540 | 0.0560 |
169
+ | 0.0413 | 4.625 | 555 | 0.0559 |
170
+ | 0.0318 | 4.75 | 570 | 0.0558 |
171
+ | 0.0435 | 4.875 | 585 | 0.0558 |
172
+ | 0.0445 | 5.0 | 600 | 0.0558 |
173
+
174
+
175
+ ### Framework versions
176
+
177
+ - Transformers 4.51.1
178
+ - Pytorch 2.6.0+cu124
179
+ - Datasets 3.5.0
180
+ - Tokenizers 0.21.1