elichen3051 commited on
Commit
ce29006
·
verified ·
1 Parent(s): 351c81f

Model save

Browse files
Files changed (2) hide show
  1. README.md +184 -0
  2. generation_config.json +7 -0
README.md ADDED
@@ -0,0 +1,184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: mistralai/Mistral-7B-v0.1
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: Mistral-7B-v0.1-q-sparse-fineweb-edu-table2-re
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.5.2`
20
+ ```yaml
21
+ base_model: mistralai/Mistral-7B-v0.1
22
+ model_type: AutoModelForCausalLM
23
+ tokenizer_type: AutoTokenizer
24
+ tokenizer_use_fast: false
25
+ resize_token_embeddings_to_32x: false
26
+
27
+ flash_attention: true
28
+ xformers_attention:
29
+
30
+ load_in_8bit: false
31
+ load_in_4bit: false
32
+ strict: false
33
+
34
+ datasets:
35
+ - path: skymizer/Mistral-7B-v0.1-base-tokenized-fineweb-edu-45B-4096
36
+ train_on_split: train
37
+ type: completion
38
+
39
+ test_datasets:
40
+ - path: skymizer/Mistral-7B-v0.1-base-tokenized-fineweb-edu-test-4K
41
+ split: test
42
+ type: completion
43
+
44
+ is_preprocess: true
45
+ skip_prepare_dataset: true
46
+
47
+ dataset_prepared_path:
48
+
49
+ hf_use_auth_token: true
50
+ output_dir: /mnt/home/model-team/models/Mistral-7B-v0.1-q-sparse-fineweb-edu-table2-re
51
+ resume_from_checkpoint:
52
+ auto_resume_from_checkpoints: true
53
+
54
+ sequence_len: 4096
55
+ sample_packing: true
56
+ sample_packing_group_size: 100000
57
+ sample_packing_bin_size: 200
58
+ pad_to_sequence_len: true
59
+
60
+ eval_sample_packing: false
61
+ # eval_causal_lm_metrics: ["perplexity"]
62
+
63
+ wandb_project: "sparse-tuning-cpt"
64
+ wandb_entity:
65
+ wandb_watch:
66
+ wandb_name: "Mistral-7B-v0.1-q-sparse-fineweb-edu-table2-re"
67
+ wandb_log_model:
68
+
69
+ # global batch size = 2 * 8 * 8 GPUs * 8 Nodes * 4096 = 4M
70
+ gradient_accumulation_steps: 2
71
+ micro_batch_size: 8
72
+ eval_batch_size: 1
73
+ max_steps: 10000
74
+ optimizer: adamw_torch
75
+ learning_rate: 0.00005
76
+ lr_scheduler: cosine
77
+ cosine_min_lr_ratio: 0.2
78
+ weight_decay: 0.01
79
+ adam_beta1: 0.9
80
+ adam_beta2: 0.95
81
+ adam_eps: 0.000001
82
+ max_grad_norm: 2.0
83
+
84
+ train_on_inputs: false
85
+ group_by_length: false
86
+ bf16: true
87
+ fp16:
88
+ tf32: false
89
+
90
+ hub_model_id: "skymizer/Mistral-7B-v0.1-q-sparse-fineweb-edu-table2-re"
91
+
92
+ save_strategy: "steps"
93
+ save_steps: 500
94
+
95
+ gradient_checkpointing: true
96
+ gradient_checkpointing_kwargs:
97
+ use_reentrant: false
98
+ early_stopping_patience:
99
+ local_rank:
100
+ logging_steps: 1
101
+
102
+ warmup_steps: 375
103
+ eval_steps: 500
104
+ eval_table_size:
105
+ debug:
106
+ deepspeed: /root/train/axolotl/deepspeed_configs/zero3_bf16.json
107
+ fsdp:
108
+ fsdp_config:
109
+ seed: 42
110
+
111
+ ```
112
+
113
+ </details><br>
114
+
115
+ # Mistral-7B-v0.1-q-sparse-fineweb-edu-table2-re
116
+
117
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
118
+ It achieves the following results on the evaluation set:
119
+ - Loss: 1.9784
120
+
121
+ ## Model description
122
+
123
+ More information needed
124
+
125
+ ## Intended uses & limitations
126
+
127
+ More information needed
128
+
129
+ ## Training and evaluation data
130
+
131
+ More information needed
132
+
133
+ ## Training procedure
134
+
135
+ ### Training hyperparameters
136
+
137
+ The following hyperparameters were used during training:
138
+ - learning_rate: 5e-05
139
+ - train_batch_size: 8
140
+ - eval_batch_size: 1
141
+ - seed: 42
142
+ - distributed_type: multi-GPU
143
+ - num_devices: 64
144
+ - gradient_accumulation_steps: 2
145
+ - total_train_batch_size: 1024
146
+ - total_eval_batch_size: 64
147
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
148
+ - lr_scheduler_type: cosine
149
+ - lr_scheduler_warmup_steps: 375
150
+ - training_steps: 10000
151
+
152
+ ### Training results
153
+
154
+ | Training Loss | Epoch | Step | Validation Loss |
155
+ |:-------------:|:------:|:-----:|:---------------:|
156
+ | 11.1526 | 0.0001 | 1 | 11.1178 |
157
+ | 3.9513 | 0.0408 | 500 | 3.7699 |
158
+ | 3.4469 | 0.0817 | 1000 | 3.2772 |
159
+ | 3.1993 | 0.1225 | 1500 | 3.0024 |
160
+ | 2.8081 | 0.1633 | 2000 | 2.7218 |
161
+ | 2.5217 | 0.2042 | 2500 | 2.4860 |
162
+ | 2.3993 | 0.2450 | 3000 | 2.3570 |
163
+ | 2.2919 | 0.2858 | 3500 | 2.2761 |
164
+ | 2.2379 | 0.3267 | 4000 | 2.2180 |
165
+ | 2.2047 | 0.3675 | 4500 | 2.1721 |
166
+ | 2.1553 | 0.4083 | 5000 | 2.1367 |
167
+ | 2.1279 | 0.4491 | 5500 | 2.1066 |
168
+ | 2.0689 | 0.4900 | 6000 | 2.0822 |
169
+ | 2.0702 | 0.5308 | 6500 | 2.0608 |
170
+ | 2.0611 | 0.5716 | 7000 | 2.0425 |
171
+ | 2.0242 | 0.6125 | 7500 | 2.0264 |
172
+ | 2.0449 | 0.6533 | 8000 | 2.0140 |
173
+ | 2.0245 | 0.6941 | 8500 | 2.0025 |
174
+ | 2.0107 | 0.7350 | 9000 | 1.9933 |
175
+ | 1.9995 | 0.7758 | 9500 | 1.9851 |
176
+ | 1.9995 | 0.8166 | 10000 | 1.9784 |
177
+
178
+
179
+ ### Framework versions
180
+
181
+ - Transformers 4.46.3
182
+ - Pytorch 2.5.1+cu124
183
+ - Datasets 3.1.0
184
+ - Tokenizers 0.20.3
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "do_sample": true,
5
+ "eos_token_id": 2,
6
+ "transformers_version": "4.46.3"
7
+ }