nsantolla commited on
Commit
af223f7
·
verified ·
1 Parent(s): 9d4c266

Upload 15 files

Browse files
Files changed (9) hide show
  1. README.md +196 -11
  2. adapter_model.safetensors +2 -2
  3. optimizer.pt +1 -1
  4. rng_state.pth +1 -1
  5. scaler.pt +3 -0
  6. scheduler.pt +1 -1
  7. tokenizer.json +2 -16
  8. trainer_state.json +1064 -117
  9. training_args.bin +2 -2
README.md CHANGED
@@ -1,17 +1,202 @@
1
  ---
2
  base_model: microsoft/phi-4
3
  library_name: peft
4
- license: gpl-3.0
5
- pipeline_tag: text-generation
6
- tags:
7
- - biology
8
- - chemistry
9
- - antibody
10
- - drug-design
11
- - protein
12
- - amino-acid
13
  ---
14
 
15
- # peleke-phi-4 🦋
16
 
17
- A fine-tuned protein language model, based on Microsoft's Phi-4, for targeted antibody sequence generation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  base_model: microsoft/phi-4
3
  library_name: peft
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
+ # Model Card for Model ID
7
 
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9dd57e4036fbe27e465912f0ce491e76bb4bb35b025a972201b100bb4f814313
3
- size 29512680
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d6851a5df88091ca237b847c4659b23f98d16951db397593e87ced71c766a38
3
+ size 2084824272
optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b51cb22f1bf907ad9223d3751417d15baae53cd95b8a3905d5ca92b3805195ea
3
  size 59117259
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d488fe514b22a65589b90ffc1639497cc18bee6eac9bfb258ab143195cde3c2d
3
  size 59117259
rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:11362d06d573fbb9a8fc678e2da35679a5ed0bb74d6c9c2cc7f389c7943f4885
3
  size 14645
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5671e5814bc2b50e8ca672217a479b935e2e09221f419b16f33a4d73cf1ea4f2
3
  size 14645
scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc302c1ea6f8760b538d34aa63eb10e75b8051a0525238af5b340d810a2f312c
3
+ size 1383
scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b4da84491766401122650a0476d02dff636c879f515be9194eb3503b780334ec
3
  size 1465
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76d499ff7f9713e6ce83515f64a51745b01dce3e0c41d165acffd8052e740f14
3
  size 1465
tokenizer.json CHANGED
@@ -1,21 +1,7 @@
1
  {
2
  "version": "1.0",
3
- "truncation": {
4
- "direction": "Right",
5
- "max_length": 256,
6
- "strategy": "LongestFirst",
7
- "stride": 0
8
- },
9
- "padding": {
10
- "strategy": {
11
- "Fixed": 256
12
- },
13
- "direction": "Right",
14
- "pad_to_multiple_of": null,
15
- "pad_id": 100349,
16
- "pad_type_id": 0,
17
- "pad_token": "<|dummy_85|>"
18
- },
19
  "added_tokens": [
20
  {
21
  "id": 100256,
 
1
  {
2
  "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  "added_tokens": [
6
  {
7
  "id": 100256,
trainer_state.json CHANGED
@@ -4,210 +4,1157 @@
4
  "best_model_checkpoint": null,
5
  "epoch": 3.0,
6
  "eval_steps": 500,
7
- "global_step": 1434,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
11
  "log_history": [
12
  {
13
- "epoch": 0.10460251046025104,
14
- "grad_norm": 0.46963590383529663,
15
- "learning_rate": 4.914833215046132e-05,
16
- "loss": 3.5168,
 
 
 
 
 
 
 
 
 
 
 
17
  "step": 50
18
  },
19
  {
20
- "epoch": 0.20920502092050208,
21
- "grad_norm": 0.6878917813301086,
22
- "learning_rate": 4.737402413058907e-05,
23
- "loss": 3.2332,
 
 
 
 
 
 
 
 
 
 
 
24
  "step": 100
25
  },
26
  {
27
- "epoch": 0.3138075313807531,
28
- "grad_norm": 1.0124949216842651,
29
- "learning_rate": 4.559971611071682e-05,
30
- "loss": 3.0309,
 
 
 
 
 
 
 
 
 
 
 
31
  "step": 150
32
  },
33
  {
34
- "epoch": 0.41841004184100417,
35
- "grad_norm": 1.4931869506835938,
36
- "learning_rate": 4.382540809084457e-05,
37
- "loss": 2.9624,
 
 
 
 
 
 
 
 
 
 
 
38
  "step": 200
39
  },
40
  {
41
- "epoch": 0.5230125523012552,
42
- "grad_norm": 3.4913582801818848,
43
- "learning_rate": 4.205110007097232e-05,
44
- "loss": 2.843,
 
 
 
 
 
 
 
 
 
 
 
45
  "step": 250
46
  },
47
  {
48
- "epoch": 0.6276150627615062,
49
- "grad_norm": 3.3243305683135986,
50
- "learning_rate": 4.027679205110007e-05,
51
- "loss": 2.6463,
 
 
 
 
 
 
 
 
 
 
 
52
  "step": 300
53
  },
54
  {
55
- "epoch": 0.7322175732217573,
56
- "grad_norm": 2.9747416973114014,
57
- "learning_rate": 3.8502484031227824e-05,
58
- "loss": 2.6743,
 
 
 
 
 
 
 
 
 
 
 
59
  "step": 350
60
  },
61
  {
62
- "epoch": 0.8368200836820083,
63
- "grad_norm": 2.389204978942871,
64
- "learning_rate": 3.6728176011355574e-05,
65
- "loss": 2.6132,
 
 
 
 
 
 
 
 
 
 
 
66
  "step": 400
67
  },
68
  {
69
- "epoch": 0.9414225941422594,
70
- "grad_norm": 2.479804277420044,
71
- "learning_rate": 3.4953867991483324e-05,
72
- "loss": 2.5759,
 
 
 
 
 
 
 
 
 
 
 
73
  "step": 450
74
  },
75
  {
76
- "epoch": 1.0460251046025104,
77
- "grad_norm": 2.7939741611480713,
78
- "learning_rate": 3.3179559971611075e-05,
79
- "loss": 2.5372,
 
 
 
 
 
 
 
 
 
 
 
80
  "step": 500
81
  },
82
  {
83
- "epoch": 1.1506276150627615,
84
- "grad_norm": 2.7763280868530273,
85
- "learning_rate": 3.1405251951738825e-05,
86
- "loss": 2.4478,
 
 
 
 
 
 
 
 
 
 
 
87
  "step": 550
88
  },
89
  {
90
- "epoch": 1.2552301255230125,
91
- "grad_norm": 3.029435157775879,
92
- "learning_rate": 2.9630943931866572e-05,
93
- "loss": 2.3703,
 
 
 
 
 
 
 
 
 
 
 
94
  "step": 600
95
  },
96
  {
97
- "epoch": 1.3598326359832635,
98
- "grad_norm": 2.2191221714019775,
99
- "learning_rate": 2.7856635911994322e-05,
100
- "loss": 2.3923,
 
 
 
 
 
 
 
 
 
 
 
101
  "step": 650
102
  },
103
  {
104
- "epoch": 1.4644351464435146,
105
- "grad_norm": 4.677400588989258,
106
- "learning_rate": 2.6082327892122073e-05,
107
- "loss": 2.3571,
 
 
 
 
 
 
 
 
 
 
 
108
  "step": 700
109
  },
110
  {
111
- "epoch": 1.5690376569037658,
112
- "grad_norm": 4.942624092102051,
113
- "learning_rate": 2.4308019872249823e-05,
114
- "loss": 2.4729,
 
 
 
 
 
 
 
 
 
 
 
115
  "step": 750
116
  },
117
  {
118
- "epoch": 1.6736401673640167,
119
- "grad_norm": 2.2605793476104736,
120
- "learning_rate": 2.2533711852377574e-05,
121
- "loss": 2.2654,
 
 
 
 
 
 
 
 
 
 
 
122
  "step": 800
123
  },
124
  {
125
- "epoch": 1.778242677824268,
126
- "grad_norm": 3.652431011199951,
127
- "learning_rate": 2.0759403832505324e-05,
128
- "loss": 2.2042,
 
 
 
 
 
 
 
 
 
 
 
129
  "step": 850
130
  },
131
  {
132
- "epoch": 1.8828451882845187,
133
- "grad_norm": 3.4865975379943848,
134
- "learning_rate": 1.8985095812633074e-05,
135
- "loss": 2.3344,
 
 
 
 
 
 
 
 
 
 
 
136
  "step": 900
137
  },
138
  {
139
- "epoch": 1.98744769874477,
140
- "grad_norm": 5.6987199783325195,
141
- "learning_rate": 1.7210787792760825e-05,
142
- "loss": 2.2885,
 
 
 
 
 
 
 
 
 
 
 
143
  "step": 950
144
  },
145
  {
146
- "epoch": 2.092050209205021,
147
- "grad_norm": 4.525824069976807,
148
- "learning_rate": 1.5436479772888575e-05,
149
- "loss": 2.259,
 
 
 
 
 
 
 
 
 
 
 
150
  "step": 1000
151
  },
152
  {
153
- "epoch": 2.196652719665272,
154
- "grad_norm": 5.680797100067139,
155
- "learning_rate": 1.3662171753016322e-05,
156
- "loss": 2.3047,
 
 
 
 
 
 
 
 
 
 
 
157
  "step": 1050
158
  },
159
  {
160
- "epoch": 2.301255230125523,
161
- "grad_norm": 6.203536510467529,
162
- "learning_rate": 1.1887863733144074e-05,
163
- "loss": 2.1082,
 
 
 
 
 
 
 
 
 
 
 
164
  "step": 1100
165
  },
166
  {
167
- "epoch": 2.405857740585774,
168
- "grad_norm": 6.541838645935059,
169
- "learning_rate": 1.0113555713271825e-05,
170
- "loss": 2.1743,
 
 
 
 
 
 
 
 
 
 
 
171
  "step": 1150
172
  },
173
  {
174
- "epoch": 2.510460251046025,
175
- "grad_norm": 4.907942771911621,
176
- "learning_rate": 8.339247693399573e-06,
177
- "loss": 2.2362,
 
 
 
 
 
 
 
 
 
 
 
178
  "step": 1200
179
  },
180
  {
181
- "epoch": 2.6150627615062763,
182
- "grad_norm": 4.049306392669678,
183
- "learning_rate": 6.5649396735273244e-06,
184
- "loss": 2.2656,
 
 
 
 
 
 
 
 
 
 
 
185
  "step": 1250
186
  },
187
  {
188
- "epoch": 2.719665271966527,
189
- "grad_norm": 4.077224254608154,
190
- "learning_rate": 4.790631653655075e-06,
191
- "loss": 2.1504,
 
 
 
 
 
 
 
 
 
 
 
192
  "step": 1300
193
  },
194
  {
195
- "epoch": 2.8242677824267783,
196
- "grad_norm": 4.366615295410156,
197
- "learning_rate": 3.0163236337828248e-06,
198
- "loss": 2.0985,
 
 
 
 
 
 
 
 
 
 
 
199
  "step": 1350
200
  },
201
  {
202
- "epoch": 2.928870292887029,
203
- "grad_norm": 3.2660391330718994,
204
- "learning_rate": 1.242015613910575e-06,
205
- "loss": 2.1049,
 
 
 
 
 
 
 
 
 
 
 
206
  "step": 1400
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
207
  }
208
  ],
209
- "logging_steps": 50,
210
- "max_steps": 1434,
211
  "num_input_tokens_seen": 0,
212
  "num_train_epochs": 3,
213
  "save_steps": 500,
@@ -223,8 +1170,8 @@
223
  "attributes": {}
224
  }
225
  },
226
- "total_flos": 4.9878253996867584e+17,
227
- "train_batch_size": 1,
228
  "trial_name": null,
229
  "trial_params": null
230
  }
 
4
  "best_model_checkpoint": null,
5
  "epoch": 3.0,
6
  "eval_steps": 500,
7
+ "global_step": 3177,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
11
  "log_history": [
12
  {
13
+ "epoch": 0.023607176581680833,
14
+ "grad_norm": 0.8597469329833984,
15
+ "learning_rate": 0.00017600000000000002,
16
+ "loss": 4.9024,
17
+ "mean_token_accuracy": 0.2909739096462727,
18
+ "num_tokens": 86323.0,
19
+ "step": 25
20
+ },
21
+ {
22
+ "epoch": 0.047214353163361665,
23
+ "grad_norm": 0.7611781358718872,
24
+ "learning_rate": 0.0001986040609137056,
25
+ "loss": 4.3579,
26
+ "mean_token_accuracy": 0.3365874075889587,
27
+ "num_tokens": 170088.0,
28
  "step": 50
29
  },
30
  {
31
+ "epoch": 0.0708215297450425,
32
+ "grad_norm": 0.8384687900543213,
33
+ "learning_rate": 0.00019701776649746192,
34
+ "loss": 4.2579,
35
+ "mean_token_accuracy": 0.33202287018299104,
36
+ "num_tokens": 248841.0,
37
+ "step": 75
38
+ },
39
+ {
40
+ "epoch": 0.09442870632672333,
41
+ "grad_norm": 0.8371444940567017,
42
+ "learning_rate": 0.00019543147208121828,
43
+ "loss": 3.9475,
44
+ "mean_token_accuracy": 0.3752408367395401,
45
+ "num_tokens": 331838.0,
46
  "step": 100
47
  },
48
  {
49
+ "epoch": 0.11803588290840415,
50
+ "grad_norm": 1.3010390996932983,
51
+ "learning_rate": 0.00019384517766497464,
52
+ "loss": 3.6464,
53
+ "mean_token_accuracy": 0.4282234400510788,
54
+ "num_tokens": 416982.0,
55
+ "step": 125
56
+ },
57
+ {
58
+ "epoch": 0.141643059490085,
59
+ "grad_norm": 1.102060079574585,
60
+ "learning_rate": 0.000192258883248731,
61
+ "loss": 3.6204,
62
+ "mean_token_accuracy": 0.42927508473396303,
63
+ "num_tokens": 501322.0,
64
  "step": 150
65
  },
66
  {
67
+ "epoch": 0.1652502360717658,
68
+ "grad_norm": 1.1159696578979492,
69
+ "learning_rate": 0.00019067258883248732,
70
+ "loss": 3.5222,
71
+ "mean_token_accuracy": 0.4415357458591461,
72
+ "num_tokens": 591622.0,
73
+ "step": 175
74
+ },
75
+ {
76
+ "epoch": 0.18885741265344666,
77
+ "grad_norm": 3.7764205932617188,
78
+ "learning_rate": 0.00018908629441624365,
79
+ "loss": 3.6167,
80
+ "mean_token_accuracy": 0.42590244352817536,
81
+ "num_tokens": 681171.0,
82
  "step": 200
83
  },
84
  {
85
+ "epoch": 0.21246458923512748,
86
+ "grad_norm": 1.3314100503921509,
87
+ "learning_rate": 0.0001875,
88
+ "loss": 3.7327,
89
+ "mean_token_accuracy": 0.4113136053085327,
90
+ "num_tokens": 762751.0,
91
+ "step": 225
92
+ },
93
+ {
94
+ "epoch": 0.2360717658168083,
95
+ "grad_norm": 1.455350637435913,
96
+ "learning_rate": 0.00018591370558375636,
97
+ "loss": 3.3504,
98
+ "mean_token_accuracy": 0.47106861472129824,
99
+ "num_tokens": 851051.0,
100
  "step": 250
101
  },
102
  {
103
+ "epoch": 0.25967894239848915,
104
+ "grad_norm": 1.2478972673416138,
105
+ "learning_rate": 0.00018432741116751272,
106
+ "loss": 3.4177,
107
+ "mean_token_accuracy": 0.46144390881061553,
108
+ "num_tokens": 937834.0,
109
+ "step": 275
110
+ },
111
+ {
112
+ "epoch": 0.28328611898017,
113
+ "grad_norm": 1.6548963785171509,
114
+ "learning_rate": 0.00018274111675126904,
115
+ "loss": 3.4914,
116
+ "mean_token_accuracy": 0.450729478597641,
117
+ "num_tokens": 1017911.0,
118
  "step": 300
119
  },
120
  {
121
+ "epoch": 0.3068932955618508,
122
+ "grad_norm": 1.7462213039398193,
123
+ "learning_rate": 0.0001811548223350254,
124
+ "loss": 3.2948,
125
+ "mean_token_accuracy": 0.4845814883708954,
126
+ "num_tokens": 1103573.0,
127
+ "step": 325
128
+ },
129
+ {
130
+ "epoch": 0.3305004721435316,
131
+ "grad_norm": 1.2442463636398315,
132
+ "learning_rate": 0.00017956852791878173,
133
+ "loss": 3.351,
134
+ "mean_token_accuracy": 0.473716744184494,
135
+ "num_tokens": 1190283.0,
136
  "step": 350
137
  },
138
  {
139
+ "epoch": 0.35410764872521244,
140
+ "grad_norm": 0.9460684657096863,
141
+ "learning_rate": 0.00017798223350253808,
142
+ "loss": 3.3257,
143
+ "mean_token_accuracy": 0.48253663778305056,
144
+ "num_tokens": 1275756.0,
145
+ "step": 375
146
+ },
147
+ {
148
+ "epoch": 0.3777148253068933,
149
+ "grad_norm": 1.3959977626800537,
150
+ "learning_rate": 0.0001763959390862944,
151
+ "loss": 3.3016,
152
+ "mean_token_accuracy": 0.48651802659034726,
153
+ "num_tokens": 1359058.0,
154
  "step": 400
155
  },
156
  {
157
+ "epoch": 0.40132200188857414,
158
+ "grad_norm": 1.5676658153533936,
159
+ "learning_rate": 0.00017480964467005077,
160
+ "loss": 3.0587,
161
+ "mean_token_accuracy": 0.5262654614448548,
162
+ "num_tokens": 1447793.0,
163
+ "step": 425
164
+ },
165
+ {
166
+ "epoch": 0.42492917847025496,
167
+ "grad_norm": 1.6668205261230469,
168
+ "learning_rate": 0.00017322335025380712,
169
+ "loss": 2.9721,
170
+ "mean_token_accuracy": 0.542415668964386,
171
+ "num_tokens": 1533910.0,
172
  "step": 450
173
  },
174
  {
175
+ "epoch": 0.4485363550519358,
176
+ "grad_norm": 1.1585041284561157,
177
+ "learning_rate": 0.00017163705583756348,
178
+ "loss": 2.9116,
179
+ "mean_token_accuracy": 0.5552564060688019,
180
+ "num_tokens": 1620704.0,
181
+ "step": 475
182
+ },
183
+ {
184
+ "epoch": 0.4721435316336166,
185
+ "grad_norm": 1.7781224250793457,
186
+ "learning_rate": 0.0001700507614213198,
187
+ "loss": 2.9046,
188
+ "mean_token_accuracy": 0.5538024806976318,
189
+ "num_tokens": 1706296.0,
190
  "step": 500
191
  },
192
  {
193
+ "epoch": 0.49575070821529743,
194
+ "grad_norm": 2.213106632232666,
195
+ "learning_rate": 0.00016846446700507614,
196
+ "loss": 3.0646,
197
+ "mean_token_accuracy": 0.5227190399169922,
198
+ "num_tokens": 1791100.0,
199
+ "step": 525
200
+ },
201
+ {
202
+ "epoch": 0.5193578847969783,
203
+ "grad_norm": 4.082686901092529,
204
+ "learning_rate": 0.0001668781725888325,
205
+ "loss": 3.081,
206
+ "mean_token_accuracy": 0.523969988822937,
207
+ "num_tokens": 1869553.0,
208
  "step": 550
209
  },
210
  {
211
+ "epoch": 0.5429650613786591,
212
+ "grad_norm": 1.7952462434768677,
213
+ "learning_rate": 0.00016529187817258885,
214
+ "loss": 3.2597,
215
+ "mean_token_accuracy": 0.496154887676239,
216
+ "num_tokens": 1949366.0,
217
+ "step": 575
218
+ },
219
+ {
220
+ "epoch": 0.56657223796034,
221
+ "grad_norm": 1.9742423295974731,
222
+ "learning_rate": 0.0001637055837563452,
223
+ "loss": 3.0122,
224
+ "mean_token_accuracy": 0.5359121215343475,
225
+ "num_tokens": 2034686.0,
226
  "step": 600
227
  },
228
  {
229
+ "epoch": 0.5901794145420207,
230
+ "grad_norm": 1.3130273818969727,
231
+ "learning_rate": 0.00016211928934010153,
232
+ "loss": 3.0088,
233
+ "mean_token_accuracy": 0.5349772965908051,
234
+ "num_tokens": 2121997.0,
235
+ "step": 625
236
+ },
237
+ {
238
+ "epoch": 0.6137865911237016,
239
+ "grad_norm": 1.7737120389938354,
240
+ "learning_rate": 0.00016053299492385786,
241
+ "loss": 2.8518,
242
+ "mean_token_accuracy": 0.5664570724964142,
243
+ "num_tokens": 2209709.0,
244
  "step": 650
245
  },
246
  {
247
+ "epoch": 0.6373937677053825,
248
+ "grad_norm": 1.397939682006836,
249
+ "learning_rate": 0.00015894670050761421,
250
+ "loss": 2.7886,
251
+ "mean_token_accuracy": 0.5750785338878631,
252
+ "num_tokens": 2295052.0,
253
+ "step": 675
254
+ },
255
+ {
256
+ "epoch": 0.6610009442870632,
257
+ "grad_norm": 2.435749053955078,
258
+ "learning_rate": 0.00015736040609137057,
259
+ "loss": 3.0218,
260
+ "mean_token_accuracy": 0.5341324102878571,
261
+ "num_tokens": 2378262.0,
262
  "step": 700
263
  },
264
  {
265
+ "epoch": 0.6846081208687441,
266
+ "grad_norm": 1.3271034955978394,
267
+ "learning_rate": 0.00015577411167512693,
268
+ "loss": 2.9578,
269
+ "mean_token_accuracy": 0.5410768449306488,
270
+ "num_tokens": 2463204.0,
271
+ "step": 725
272
+ },
273
+ {
274
+ "epoch": 0.7082152974504249,
275
+ "grad_norm": 1.8051323890686035,
276
+ "learning_rate": 0.00015418781725888325,
277
+ "loss": 2.9002,
278
+ "mean_token_accuracy": 0.5550252544879913,
279
+ "num_tokens": 2547027.0,
280
  "step": 750
281
  },
282
  {
283
+ "epoch": 0.7318224740321058,
284
+ "grad_norm": 0.9294273257255554,
285
+ "learning_rate": 0.0001526015228426396,
286
+ "loss": 2.7967,
287
+ "mean_token_accuracy": 0.572112466096878,
288
+ "num_tokens": 2635107.0,
289
+ "step": 775
290
+ },
291
+ {
292
+ "epoch": 0.7554296506137866,
293
+ "grad_norm": 1.3318583965301514,
294
+ "learning_rate": 0.00015101522842639594,
295
+ "loss": 2.908,
296
+ "mean_token_accuracy": 0.5564561116695405,
297
+ "num_tokens": 2719167.0,
298
  "step": 800
299
  },
300
  {
301
+ "epoch": 0.7790368271954674,
302
+ "grad_norm": 1.1354563236236572,
303
+ "learning_rate": 0.0001494289340101523,
304
+ "loss": 2.8887,
305
+ "mean_token_accuracy": 0.5594349646568298,
306
+ "num_tokens": 2805060.0,
307
+ "step": 825
308
+ },
309
+ {
310
+ "epoch": 0.8026440037771483,
311
+ "grad_norm": 3.2933058738708496,
312
+ "learning_rate": 0.00014784263959390862,
313
+ "loss": 2.8668,
314
+ "mean_token_accuracy": 0.5591881954669953,
315
+ "num_tokens": 2887082.0,
316
  "step": 850
317
  },
318
  {
319
+ "epoch": 0.826251180358829,
320
+ "grad_norm": 1.8824466466903687,
321
+ "learning_rate": 0.00014625634517766498,
322
+ "loss": 2.834,
323
+ "mean_token_accuracy": 0.5663117682933807,
324
+ "num_tokens": 2973661.0,
325
+ "step": 875
326
+ },
327
+ {
328
+ "epoch": 0.8498583569405099,
329
+ "grad_norm": 1.2233247756958008,
330
+ "learning_rate": 0.00014467005076142133,
331
+ "loss": 2.7768,
332
+ "mean_token_accuracy": 0.5772299838066101,
333
+ "num_tokens": 3060750.0,
334
  "step": 900
335
  },
336
  {
337
+ "epoch": 0.8734655335221907,
338
+ "grad_norm": 1.7638064622879028,
339
+ "learning_rate": 0.0001430837563451777,
340
+ "loss": 2.678,
341
+ "mean_token_accuracy": 0.5990130162239075,
342
+ "num_tokens": 3143868.0,
343
+ "step": 925
344
+ },
345
+ {
346
+ "epoch": 0.8970727101038716,
347
+ "grad_norm": 1.5392266511917114,
348
+ "learning_rate": 0.00014149746192893402,
349
+ "loss": 2.6573,
350
+ "mean_token_accuracy": 0.5954342544078827,
351
+ "num_tokens": 3231027.0,
352
  "step": 950
353
  },
354
  {
355
+ "epoch": 0.9206798866855525,
356
+ "grad_norm": 1.8433501720428467,
357
+ "learning_rate": 0.00013991116751269035,
358
+ "loss": 2.9661,
359
+ "mean_token_accuracy": 0.5433392310142517,
360
+ "num_tokens": 3314925.0,
361
+ "step": 975
362
+ },
363
+ {
364
+ "epoch": 0.9442870632672332,
365
+ "grad_norm": 1.3953369855880737,
366
+ "learning_rate": 0.0001383248730964467,
367
+ "loss": 2.7778,
368
+ "mean_token_accuracy": 0.5806617629528046,
369
+ "num_tokens": 3398916.0,
370
  "step": 1000
371
  },
372
  {
373
+ "epoch": 0.9678942398489141,
374
+ "grad_norm": 2.4013938903808594,
375
+ "learning_rate": 0.00013673857868020306,
376
+ "loss": 2.6282,
377
+ "mean_token_accuracy": 0.6033717966079712,
378
+ "num_tokens": 3486241.0,
379
+ "step": 1025
380
+ },
381
+ {
382
+ "epoch": 0.9915014164305949,
383
+ "grad_norm": 1.3695052862167358,
384
+ "learning_rate": 0.0001351522842639594,
385
+ "loss": 2.8629,
386
+ "mean_token_accuracy": 0.5644518744945526,
387
+ "num_tokens": 3569625.0,
388
  "step": 1050
389
  },
390
  {
391
+ "epoch": 1.0151085930122756,
392
+ "grad_norm": 1.5885721445083618,
393
+ "learning_rate": 0.00013356598984771574,
394
+ "loss": 2.7245,
395
+ "mean_token_accuracy": 0.583740828037262,
396
+ "num_tokens": 3655264.0,
397
+ "step": 1075
398
+ },
399
+ {
400
+ "epoch": 1.0387157695939566,
401
+ "grad_norm": 1.957524299621582,
402
+ "learning_rate": 0.00013197969543147207,
403
+ "loss": 2.5976,
404
+ "mean_token_accuracy": 0.6069298982620239,
405
+ "num_tokens": 3740644.0,
406
  "step": 1100
407
  },
408
  {
409
+ "epoch": 1.0623229461756374,
410
+ "grad_norm": 2.9976248741149902,
411
+ "learning_rate": 0.00013039340101522843,
412
+ "loss": 2.7362,
413
+ "mean_token_accuracy": 0.5855110836029053,
414
+ "num_tokens": 3825368.0,
415
+ "step": 1125
416
+ },
417
+ {
418
+ "epoch": 1.0859301227573182,
419
+ "grad_norm": 3.186262845993042,
420
+ "learning_rate": 0.00012880710659898478,
421
+ "loss": 2.8484,
422
+ "mean_token_accuracy": 0.5662666463851929,
423
+ "num_tokens": 3908678.0,
424
  "step": 1150
425
  },
426
  {
427
+ "epoch": 1.1095372993389991,
428
+ "grad_norm": 1.8036489486694336,
429
+ "learning_rate": 0.00012722081218274114,
430
+ "loss": 2.67,
431
+ "mean_token_accuracy": 0.5938792788982391,
432
+ "num_tokens": 3996393.0,
433
+ "step": 1175
434
+ },
435
+ {
436
+ "epoch": 1.13314447592068,
437
+ "grad_norm": 2.489654779434204,
438
+ "learning_rate": 0.00012563451776649747,
439
+ "loss": 2.8123,
440
+ "mean_token_accuracy": 0.5695419406890869,
441
+ "num_tokens": 4080596.0,
442
  "step": 1200
443
  },
444
  {
445
+ "epoch": 1.1567516525023607,
446
+ "grad_norm": 3.2093520164489746,
447
+ "learning_rate": 0.00012404822335025382,
448
+ "loss": 2.517,
449
+ "mean_token_accuracy": 0.6239076638221741,
450
+ "num_tokens": 4164481.0,
451
+ "step": 1225
452
+ },
453
+ {
454
+ "epoch": 1.1803588290840414,
455
+ "grad_norm": 3.1341941356658936,
456
+ "learning_rate": 0.00012246192893401015,
457
+ "loss": 2.5919,
458
+ "mean_token_accuracy": 0.6061871576309205,
459
+ "num_tokens": 4253278.0,
460
  "step": 1250
461
  },
462
  {
463
+ "epoch": 1.2039660056657224,
464
+ "grad_norm": 1.433424711227417,
465
+ "learning_rate": 0.0001208756345177665,
466
+ "loss": 2.4236,
467
+ "mean_token_accuracy": 0.6387936723232269,
468
+ "num_tokens": 4338131.0,
469
+ "step": 1275
470
+ },
471
+ {
472
+ "epoch": 1.2275731822474032,
473
+ "grad_norm": 1.5914385318756104,
474
+ "learning_rate": 0.00011928934010152283,
475
+ "loss": 2.8934,
476
+ "mean_token_accuracy": 0.5650608110427856,
477
+ "num_tokens": 4418463.0,
478
  "step": 1300
479
  },
480
  {
481
+ "epoch": 1.251180358829084,
482
+ "grad_norm": 1.8357518911361694,
483
+ "learning_rate": 0.00011770304568527919,
484
+ "loss": 2.5408,
485
+ "mean_token_accuracy": 0.6213865375518799,
486
+ "num_tokens": 4504100.0,
487
+ "step": 1325
488
+ },
489
+ {
490
+ "epoch": 1.274787535410765,
491
+ "grad_norm": 2.181213855743408,
492
+ "learning_rate": 0.00011611675126903555,
493
+ "loss": 2.4895,
494
+ "mean_token_accuracy": 0.6242103600502014,
495
+ "num_tokens": 4590236.0,
496
  "step": 1350
497
  },
498
  {
499
+ "epoch": 1.2983947119924457,
500
+ "grad_norm": 2.054617404937744,
501
+ "learning_rate": 0.00011453045685279189,
502
+ "loss": 2.5319,
503
+ "mean_token_accuracy": 0.6264574217796326,
504
+ "num_tokens": 4673342.0,
505
+ "step": 1375
506
+ },
507
+ {
508
+ "epoch": 1.3220018885741265,
509
+ "grad_norm": 2.1715738773345947,
510
+ "learning_rate": 0.00011294416243654824,
511
+ "loss": 2.5042,
512
+ "mean_token_accuracy": 0.6200724172592164,
513
+ "num_tokens": 4760597.0,
514
  "step": 1400
515
+ },
516
+ {
517
+ "epoch": 1.3456090651558075,
518
+ "grad_norm": 3.023545265197754,
519
+ "learning_rate": 0.00011135786802030457,
520
+ "loss": 2.763,
521
+ "mean_token_accuracy": 0.5873644530773163,
522
+ "num_tokens": 4842611.0,
523
+ "step": 1425
524
+ },
525
+ {
526
+ "epoch": 1.3692162417374882,
527
+ "grad_norm": 1.7163723707199097,
528
+ "learning_rate": 0.00010977157360406091,
529
+ "loss": 2.4856,
530
+ "mean_token_accuracy": 0.6349835538864136,
531
+ "num_tokens": 4924684.0,
532
+ "step": 1450
533
+ },
534
+ {
535
+ "epoch": 1.392823418319169,
536
+ "grad_norm": 2.5177738666534424,
537
+ "learning_rate": 0.00010818527918781727,
538
+ "loss": 2.6193,
539
+ "mean_token_accuracy": 0.6035743832588196,
540
+ "num_tokens": 5011388.0,
541
+ "step": 1475
542
+ },
543
+ {
544
+ "epoch": 1.41643059490085,
545
+ "grad_norm": 1.9504915475845337,
546
+ "learning_rate": 0.00010659898477157362,
547
+ "loss": 2.5341,
548
+ "mean_token_accuracy": 0.6178851091861725,
549
+ "num_tokens": 5104282.0,
550
+ "step": 1500
551
+ },
552
+ {
553
+ "epoch": 1.4400377714825308,
554
+ "grad_norm": 1.2298864126205444,
555
+ "learning_rate": 0.00010501269035532994,
556
+ "loss": 2.4975,
557
+ "mean_token_accuracy": 0.6323371386528015,
558
+ "num_tokens": 5186482.0,
559
+ "step": 1525
560
+ },
561
+ {
562
+ "epoch": 1.4636449480642115,
563
+ "grad_norm": 2.7627272605895996,
564
+ "learning_rate": 0.0001034263959390863,
565
+ "loss": 2.5773,
566
+ "mean_token_accuracy": 0.6132258200645446,
567
+ "num_tokens": 5267604.0,
568
+ "step": 1550
569
+ },
570
+ {
571
+ "epoch": 1.4872521246458923,
572
+ "grad_norm": 2.1530709266662598,
573
+ "learning_rate": 0.00010184010152284265,
574
+ "loss": 2.7489,
575
+ "mean_token_accuracy": 0.5849710512161255,
576
+ "num_tokens": 5345250.0,
577
+ "step": 1575
578
+ },
579
+ {
580
+ "epoch": 1.510859301227573,
581
+ "grad_norm": 3.3525829315185547,
582
+ "learning_rate": 0.00010025380710659899,
583
+ "loss": 2.3328,
584
+ "mean_token_accuracy": 0.6535279083251954,
585
+ "num_tokens": 5432188.0,
586
+ "step": 1600
587
+ },
588
+ {
589
+ "epoch": 1.534466477809254,
590
+ "grad_norm": 3.1491880416870117,
591
+ "learning_rate": 9.866751269035533e-05,
592
+ "loss": 2.4587,
593
+ "mean_token_accuracy": 0.6317604184150696,
594
+ "num_tokens": 5518949.0,
595
+ "step": 1625
596
+ },
597
+ {
598
+ "epoch": 1.5580736543909348,
599
+ "grad_norm": 3.790788412094116,
600
+ "learning_rate": 9.708121827411169e-05,
601
+ "loss": 2.5356,
602
+ "mean_token_accuracy": 0.6312965428829194,
603
+ "num_tokens": 5598790.0,
604
+ "step": 1650
605
+ },
606
+ {
607
+ "epoch": 1.5816808309726156,
608
+ "grad_norm": 2.399170160293579,
609
+ "learning_rate": 9.549492385786802e-05,
610
+ "loss": 2.4952,
611
+ "mean_token_accuracy": 0.6315456974506378,
612
+ "num_tokens": 5685520.0,
613
+ "step": 1675
614
+ },
615
+ {
616
+ "epoch": 1.6052880075542966,
617
+ "grad_norm": 1.783835530281067,
618
+ "learning_rate": 9.390862944162437e-05,
619
+ "loss": 2.5257,
620
+ "mean_token_accuracy": 0.629066880941391,
621
+ "num_tokens": 5764729.0,
622
+ "step": 1700
623
+ },
624
+ {
625
+ "epoch": 1.6288951841359773,
626
+ "grad_norm": 2.1746981143951416,
627
+ "learning_rate": 9.232233502538072e-05,
628
+ "loss": 2.3915,
629
+ "mean_token_accuracy": 0.6475781321525573,
630
+ "num_tokens": 5847096.0,
631
+ "step": 1725
632
+ },
633
+ {
634
+ "epoch": 1.652502360717658,
635
+ "grad_norm": 2.853606700897217,
636
+ "learning_rate": 9.073604060913706e-05,
637
+ "loss": 2.4254,
638
+ "mean_token_accuracy": 0.6364664602279663,
639
+ "num_tokens": 5929730.0,
640
+ "step": 1750
641
+ },
642
+ {
643
+ "epoch": 1.676109537299339,
644
+ "grad_norm": 2.6709866523742676,
645
+ "learning_rate": 8.91497461928934e-05,
646
+ "loss": 2.2418,
647
+ "mean_token_accuracy": 0.6732782328128815,
648
+ "num_tokens": 6021265.0,
649
+ "step": 1775
650
+ },
651
+ {
652
+ "epoch": 1.6997167138810199,
653
+ "grad_norm": 3.0063297748565674,
654
+ "learning_rate": 8.756345177664976e-05,
655
+ "loss": 2.441,
656
+ "mean_token_accuracy": 0.6447321319580078,
657
+ "num_tokens": 6102483.0,
658
+ "step": 1800
659
+ },
660
+ {
661
+ "epoch": 1.7233238904627006,
662
+ "grad_norm": 2.609477996826172,
663
+ "learning_rate": 8.597715736040608e-05,
664
+ "loss": 2.3247,
665
+ "mean_token_accuracy": 0.667446813583374,
666
+ "num_tokens": 6186450.0,
667
+ "step": 1825
668
+ },
669
+ {
670
+ "epoch": 1.7469310670443816,
671
+ "grad_norm": 1.4909838438034058,
672
+ "learning_rate": 8.439086294416244e-05,
673
+ "loss": 2.1653,
674
+ "mean_token_accuracy": 0.6888451766967774,
675
+ "num_tokens": 6272397.0,
676
+ "step": 1850
677
+ },
678
+ {
679
+ "epoch": 1.7705382436260622,
680
+ "grad_norm": 2.26397442817688,
681
+ "learning_rate": 8.28045685279188e-05,
682
+ "loss": 2.4152,
683
+ "mean_token_accuracy": 0.6488711929321289,
684
+ "num_tokens": 6353819.0,
685
+ "step": 1875
686
+ },
687
+ {
688
+ "epoch": 1.7941454202077431,
689
+ "grad_norm": 2.281494379043579,
690
+ "learning_rate": 8.121827411167512e-05,
691
+ "loss": 2.2089,
692
+ "mean_token_accuracy": 0.6806192409992218,
693
+ "num_tokens": 6439448.0,
694
+ "step": 1900
695
+ },
696
+ {
697
+ "epoch": 1.8177525967894241,
698
+ "grad_norm": 2.4258992671966553,
699
+ "learning_rate": 7.963197969543148e-05,
700
+ "loss": 2.5648,
701
+ "mean_token_accuracy": 0.6230374383926391,
702
+ "num_tokens": 6523821.0,
703
+ "step": 1925
704
+ },
705
+ {
706
+ "epoch": 1.8413597733711047,
707
+ "grad_norm": 2.492616653442383,
708
+ "learning_rate": 7.804568527918782e-05,
709
+ "loss": 2.4212,
710
+ "mean_token_accuracy": 0.643797037601471,
711
+ "num_tokens": 6609653.0,
712
+ "step": 1950
713
+ },
714
+ {
715
+ "epoch": 1.8649669499527857,
716
+ "grad_norm": 2.589484930038452,
717
+ "learning_rate": 7.645939086294416e-05,
718
+ "loss": 2.1783,
719
+ "mean_token_accuracy": 0.6852576458454132,
720
+ "num_tokens": 6697310.0,
721
+ "step": 1975
722
+ },
723
+ {
724
+ "epoch": 1.8885741265344664,
725
+ "grad_norm": 1.6886556148529053,
726
+ "learning_rate": 7.48730964467005e-05,
727
+ "loss": 2.2581,
728
+ "mean_token_accuracy": 0.6677005457878112,
729
+ "num_tokens": 6789851.0,
730
+ "step": 2000
731
+ },
732
+ {
733
+ "epoch": 1.9121813031161472,
734
+ "grad_norm": 3.7452311515808105,
735
+ "learning_rate": 7.328680203045686e-05,
736
+ "loss": 2.3258,
737
+ "mean_token_accuracy": 0.6635724091529847,
738
+ "num_tokens": 6877195.0,
739
+ "step": 2025
740
+ },
741
+ {
742
+ "epoch": 1.9357884796978282,
743
+ "grad_norm": 2.047663688659668,
744
+ "learning_rate": 7.170050761421319e-05,
745
+ "loss": 2.3441,
746
+ "mean_token_accuracy": 0.6537975025177002,
747
+ "num_tokens": 6963541.0,
748
+ "step": 2050
749
+ },
750
+ {
751
+ "epoch": 1.959395656279509,
752
+ "grad_norm": 3.105921506881714,
753
+ "learning_rate": 7.011421319796955e-05,
754
+ "loss": 2.2945,
755
+ "mean_token_accuracy": 0.6703400027751922,
756
+ "num_tokens": 7049829.0,
757
+ "step": 2075
758
+ },
759
+ {
760
+ "epoch": 1.9830028328611897,
761
+ "grad_norm": 2.07450532913208,
762
+ "learning_rate": 6.852791878172589e-05,
763
+ "loss": 2.4282,
764
+ "mean_token_accuracy": 0.6384662842750549,
765
+ "num_tokens": 7138596.0,
766
+ "step": 2100
767
+ },
768
+ {
769
+ "epoch": 2.0066100094428707,
770
+ "grad_norm": 3.5083131790161133,
771
+ "learning_rate": 6.694162436548223e-05,
772
+ "loss": 2.43,
773
+ "mean_token_accuracy": 0.640663161277771,
774
+ "num_tokens": 7223056.0,
775
+ "step": 2125
776
+ },
777
+ {
778
+ "epoch": 2.0302171860245513,
779
+ "grad_norm": 1.5466697216033936,
780
+ "learning_rate": 6.541878172588833e-05,
781
+ "loss": 2.4533,
782
+ "mean_token_accuracy": 0.6328469634056091,
783
+ "num_tokens": 7309998.0,
784
+ "step": 2150
785
+ },
786
+ {
787
+ "epoch": 2.0538243626062322,
788
+ "grad_norm": 1.704559087753296,
789
+ "learning_rate": 6.383248730964467e-05,
790
+ "loss": 2.31,
791
+ "mean_token_accuracy": 0.6707417845726014,
792
+ "num_tokens": 7390437.0,
793
+ "step": 2175
794
+ },
795
+ {
796
+ "epoch": 2.0774315391879132,
797
+ "grad_norm": 2.0818848609924316,
798
+ "learning_rate": 6.224619289340103e-05,
799
+ "loss": 2.4,
800
+ "mean_token_accuracy": 0.6547241771221161,
801
+ "num_tokens": 7472360.0,
802
+ "step": 2200
803
+ },
804
+ {
805
+ "epoch": 2.101038715769594,
806
+ "grad_norm": 2.9945547580718994,
807
+ "learning_rate": 6.065989847715736e-05,
808
+ "loss": 2.2873,
809
+ "mean_token_accuracy": 0.6657208549976349,
810
+ "num_tokens": 7555482.0,
811
+ "step": 2225
812
+ },
813
+ {
814
+ "epoch": 2.1246458923512748,
815
+ "grad_norm": 3.4089205265045166,
816
+ "learning_rate": 5.907360406091371e-05,
817
+ "loss": 2.2813,
818
+ "mean_token_accuracy": 0.6657274806499481,
819
+ "num_tokens": 7646453.0,
820
+ "step": 2250
821
+ },
822
+ {
823
+ "epoch": 2.1482530689329558,
824
+ "grad_norm": 3.965263605117798,
825
+ "learning_rate": 5.748730964467005e-05,
826
+ "loss": 2.2457,
827
+ "mean_token_accuracy": 0.6795996761322022,
828
+ "num_tokens": 7729020.0,
829
+ "step": 2275
830
+ },
831
+ {
832
+ "epoch": 2.1718602455146363,
833
+ "grad_norm": 3.042095899581909,
834
+ "learning_rate": 5.59010152284264e-05,
835
+ "loss": 2.2448,
836
+ "mean_token_accuracy": 0.6785978388786316,
837
+ "num_tokens": 7813751.0,
838
+ "step": 2300
839
+ },
840
+ {
841
+ "epoch": 2.1954674220963173,
842
+ "grad_norm": 2.43571400642395,
843
+ "learning_rate": 5.431472081218274e-05,
844
+ "loss": 2.3119,
845
+ "mean_token_accuracy": 0.6689464914798736,
846
+ "num_tokens": 7895366.0,
847
+ "step": 2325
848
+ },
849
+ {
850
+ "epoch": 2.2190745986779983,
851
+ "grad_norm": 1.749118685722351,
852
+ "learning_rate": 5.272842639593909e-05,
853
+ "loss": 2.4028,
854
+ "mean_token_accuracy": 0.6496596956253051,
855
+ "num_tokens": 7978864.0,
856
+ "step": 2350
857
+ },
858
+ {
859
+ "epoch": 2.242681775259679,
860
+ "grad_norm": 4.0993547439575195,
861
+ "learning_rate": 5.114213197969543e-05,
862
+ "loss": 2.0971,
863
+ "mean_token_accuracy": 0.702848870754242,
864
+ "num_tokens": 8066059.0,
865
+ "step": 2375
866
+ },
867
+ {
868
+ "epoch": 2.26628895184136,
869
+ "grad_norm": 2.478060007095337,
870
+ "learning_rate": 4.955583756345178e-05,
871
+ "loss": 2.3131,
872
+ "mean_token_accuracy": 0.6597225499153138,
873
+ "num_tokens": 8150324.0,
874
+ "step": 2400
875
+ },
876
+ {
877
+ "epoch": 2.289896128423041,
878
+ "grad_norm": 3.1821959018707275,
879
+ "learning_rate": 4.7969543147208126e-05,
880
+ "loss": 2.2792,
881
+ "mean_token_accuracy": 0.6698596668243408,
882
+ "num_tokens": 8237632.0,
883
+ "step": 2425
884
+ },
885
+ {
886
+ "epoch": 2.3135033050047213,
887
+ "grad_norm": 1.7894206047058105,
888
+ "learning_rate": 4.638324873096447e-05,
889
+ "loss": 2.1695,
890
+ "mean_token_accuracy": 0.687961802482605,
891
+ "num_tokens": 8324924.0,
892
+ "step": 2450
893
+ },
894
+ {
895
+ "epoch": 2.3371104815864023,
896
+ "grad_norm": 2.1389191150665283,
897
+ "learning_rate": 4.479695431472081e-05,
898
+ "loss": 2.0925,
899
+ "mean_token_accuracy": 0.7001671254634857,
900
+ "num_tokens": 8411403.0,
901
+ "step": 2475
902
+ },
903
+ {
904
+ "epoch": 2.360717658168083,
905
+ "grad_norm": 2.390141248703003,
906
+ "learning_rate": 4.321065989847716e-05,
907
+ "loss": 2.3656,
908
+ "mean_token_accuracy": 0.657674194574356,
909
+ "num_tokens": 8496536.0,
910
+ "step": 2500
911
+ },
912
+ {
913
+ "epoch": 2.384324834749764,
914
+ "grad_norm": 2.3317668437957764,
915
+ "learning_rate": 4.162436548223351e-05,
916
+ "loss": 1.9864,
917
+ "mean_token_accuracy": 0.7178370308876038,
918
+ "num_tokens": 8588032.0,
919
+ "step": 2525
920
+ },
921
+ {
922
+ "epoch": 2.407932011331445,
923
+ "grad_norm": 2.255056619644165,
924
+ "learning_rate": 4.003807106598985e-05,
925
+ "loss": 2.1131,
926
+ "mean_token_accuracy": 0.7006520819664002,
927
+ "num_tokens": 8672178.0,
928
+ "step": 2550
929
+ },
930
+ {
931
+ "epoch": 2.4315391879131254,
932
+ "grad_norm": 2.2023000717163086,
933
+ "learning_rate": 3.84517766497462e-05,
934
+ "loss": 2.0892,
935
+ "mean_token_accuracy": 0.7081982207298279,
936
+ "num_tokens": 8753494.0,
937
+ "step": 2575
938
+ },
939
+ {
940
+ "epoch": 2.4551463644948064,
941
+ "grad_norm": 3.6177306175231934,
942
+ "learning_rate": 3.686548223350254e-05,
943
+ "loss": 2.2059,
944
+ "mean_token_accuracy": 0.6876888346672058,
945
+ "num_tokens": 8836935.0,
946
+ "step": 2600
947
+ },
948
+ {
949
+ "epoch": 2.4787535410764874,
950
+ "grad_norm": 2.292860746383667,
951
+ "learning_rate": 3.527918781725888e-05,
952
+ "loss": 2.3883,
953
+ "mean_token_accuracy": 0.6509598207473755,
954
+ "num_tokens": 8921920.0,
955
+ "step": 2625
956
+ },
957
+ {
958
+ "epoch": 2.502360717658168,
959
+ "grad_norm": 2.749070167541504,
960
+ "learning_rate": 3.369289340101523e-05,
961
+ "loss": 2.3556,
962
+ "mean_token_accuracy": 0.6616550016403199,
963
+ "num_tokens": 9005311.0,
964
+ "step": 2650
965
+ },
966
+ {
967
+ "epoch": 2.525967894239849,
968
+ "grad_norm": 2.500455617904663,
969
+ "learning_rate": 3.210659898477157e-05,
970
+ "loss": 2.1479,
971
+ "mean_token_accuracy": 0.6956441640853882,
972
+ "num_tokens": 9089703.0,
973
+ "step": 2675
974
+ },
975
+ {
976
+ "epoch": 2.54957507082153,
977
+ "grad_norm": 1.5603734254837036,
978
+ "learning_rate": 3.052030456852792e-05,
979
+ "loss": 2.2351,
980
+ "mean_token_accuracy": 0.6813873088359833,
981
+ "num_tokens": 9177025.0,
982
+ "step": 2700
983
+ },
984
+ {
985
+ "epoch": 2.5731822474032104,
986
+ "grad_norm": 3.4635727405548096,
987
+ "learning_rate": 2.8934010152284264e-05,
988
+ "loss": 2.1598,
989
+ "mean_token_accuracy": 0.6917344307899476,
990
+ "num_tokens": 9261429.0,
991
+ "step": 2725
992
+ },
993
+ {
994
+ "epoch": 2.5967894239848914,
995
+ "grad_norm": 2.4915714263916016,
996
+ "learning_rate": 2.7347715736040606e-05,
997
+ "loss": 2.2634,
998
+ "mean_token_accuracy": 0.6734067058563232,
999
+ "num_tokens": 9348692.0,
1000
+ "step": 2750
1001
+ },
1002
+ {
1003
+ "epoch": 2.620396600566572,
1004
+ "grad_norm": 2.7267868518829346,
1005
+ "learning_rate": 2.576142131979696e-05,
1006
+ "loss": 2.2268,
1007
+ "mean_token_accuracy": 0.6815747046470642,
1008
+ "num_tokens": 9432715.0,
1009
+ "step": 2775
1010
+ },
1011
+ {
1012
+ "epoch": 2.644003777148253,
1013
+ "grad_norm": 3.085237503051758,
1014
+ "learning_rate": 2.41751269035533e-05,
1015
+ "loss": 2.1703,
1016
+ "mean_token_accuracy": 0.6912525415420532,
1017
+ "num_tokens": 9518928.0,
1018
+ "step": 2800
1019
+ },
1020
+ {
1021
+ "epoch": 2.667610953729934,
1022
+ "grad_norm": 2.2934255599975586,
1023
+ "learning_rate": 2.2588832487309646e-05,
1024
+ "loss": 2.2124,
1025
+ "mean_token_accuracy": 0.6754102325439453,
1026
+ "num_tokens": 9606238.0,
1027
+ "step": 2825
1028
+ },
1029
+ {
1030
+ "epoch": 2.691218130311615,
1031
+ "grad_norm": 3.9280519485473633,
1032
+ "learning_rate": 2.100253807106599e-05,
1033
+ "loss": 2.0278,
1034
+ "mean_token_accuracy": 0.7123878049850464,
1035
+ "num_tokens": 9693011.0,
1036
+ "step": 2850
1037
+ },
1038
+ {
1039
+ "epoch": 2.7148253068932955,
1040
+ "grad_norm": 2.0387489795684814,
1041
+ "learning_rate": 1.9416243654822337e-05,
1042
+ "loss": 2.177,
1043
+ "mean_token_accuracy": 0.6935857963562012,
1044
+ "num_tokens": 9776929.0,
1045
+ "step": 2875
1046
+ },
1047
+ {
1048
+ "epoch": 2.7384324834749765,
1049
+ "grad_norm": 2.9819560050964355,
1050
+ "learning_rate": 1.782994923857868e-05,
1051
+ "loss": 2.0651,
1052
+ "mean_token_accuracy": 0.7067358446121216,
1053
+ "num_tokens": 9862456.0,
1054
+ "step": 2900
1055
+ },
1056
+ {
1057
+ "epoch": 2.762039660056657,
1058
+ "grad_norm": 2.204148054122925,
1059
+ "learning_rate": 1.6243654822335024e-05,
1060
+ "loss": 2.046,
1061
+ "mean_token_accuracy": 0.707190637588501,
1062
+ "num_tokens": 9952073.0,
1063
+ "step": 2925
1064
+ },
1065
+ {
1066
+ "epoch": 2.785646836638338,
1067
+ "grad_norm": 3.0924429893493652,
1068
+ "learning_rate": 1.4657360406091371e-05,
1069
+ "loss": 2.209,
1070
+ "mean_token_accuracy": 0.6862516689300537,
1071
+ "num_tokens": 10037799.0,
1072
+ "step": 2950
1073
+ },
1074
+ {
1075
+ "epoch": 2.809254013220019,
1076
+ "grad_norm": 2.6368651390075684,
1077
+ "learning_rate": 1.3071065989847717e-05,
1078
+ "loss": 2.2244,
1079
+ "mean_token_accuracy": 0.6805661177635193,
1080
+ "num_tokens": 10124423.0,
1081
+ "step": 2975
1082
+ },
1083
+ {
1084
+ "epoch": 2.8328611898017,
1085
+ "grad_norm": 1.7311336994171143,
1086
+ "learning_rate": 1.148477157360406e-05,
1087
+ "loss": 2.1135,
1088
+ "mean_token_accuracy": 0.7051725935935974,
1089
+ "num_tokens": 10202586.0,
1090
+ "step": 3000
1091
+ },
1092
+ {
1093
+ "epoch": 2.8564683663833805,
1094
+ "grad_norm": 2.4750070571899414,
1095
+ "learning_rate": 9.898477157360408e-06,
1096
+ "loss": 2.0525,
1097
+ "mean_token_accuracy": 0.7098425531387329,
1098
+ "num_tokens": 10288808.0,
1099
+ "step": 3025
1100
+ },
1101
+ {
1102
+ "epoch": 2.8800755429650615,
1103
+ "grad_norm": 2.8913192749023438,
1104
+ "learning_rate": 8.312182741116751e-06,
1105
+ "loss": 1.7762,
1106
+ "mean_token_accuracy": 0.7606491017341613,
1107
+ "num_tokens": 10375567.0,
1108
+ "step": 3050
1109
+ },
1110
+ {
1111
+ "epoch": 2.903682719546742,
1112
+ "grad_norm": 2.6616008281707764,
1113
+ "learning_rate": 6.725888324873096e-06,
1114
+ "loss": 2.064,
1115
+ "mean_token_accuracy": 0.7075335788726806,
1116
+ "num_tokens": 10462771.0,
1117
+ "step": 3075
1118
+ },
1119
+ {
1120
+ "epoch": 2.927289896128423,
1121
+ "grad_norm": 2.2261228561401367,
1122
+ "learning_rate": 5.139593908629442e-06,
1123
+ "loss": 2.2238,
1124
+ "mean_token_accuracy": 0.6874805259704589,
1125
+ "num_tokens": 10542846.0,
1126
+ "step": 3100
1127
+ },
1128
+ {
1129
+ "epoch": 2.950897072710104,
1130
+ "grad_norm": 2.295609712600708,
1131
+ "learning_rate": 3.5532994923857873e-06,
1132
+ "loss": 2.1835,
1133
+ "mean_token_accuracy": 0.6867920517921448,
1134
+ "num_tokens": 10625357.0,
1135
+ "step": 3125
1136
+ },
1137
+ {
1138
+ "epoch": 2.9745042492917846,
1139
+ "grad_norm": 1.7816609144210815,
1140
+ "learning_rate": 1.967005076142132e-06,
1141
+ "loss": 2.3207,
1142
+ "mean_token_accuracy": 0.6663406753540039,
1143
+ "num_tokens": 10706184.0,
1144
+ "step": 3150
1145
+ },
1146
+ {
1147
+ "epoch": 2.9981114258734656,
1148
+ "grad_norm": 2.8482465744018555,
1149
+ "learning_rate": 3.807106598984772e-07,
1150
+ "loss": 2.1249,
1151
+ "mean_token_accuracy": 0.6941254138946533,
1152
+ "num_tokens": 10793477.0,
1153
+ "step": 3175
1154
  }
1155
  ],
1156
+ "logging_steps": 25,
1157
+ "max_steps": 3177,
1158
  "num_input_tokens_seen": 0,
1159
  "num_train_epochs": 3,
1160
  "save_steps": 500,
 
1170
  "attributes": {}
1171
  }
1172
  },
1173
+ "total_flos": 1.6851959205842534e+18,
1174
+ "train_batch_size": 9,
1175
  "trial_name": null,
1176
  "trial_params": null
1177
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ee49d2535f124e6cd410544190a599b0d9066910dc1c177fccee4081afbbc471
3
- size 5713
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b27be49042add3dc0d0be39c1a676fe5900e3b7cd0b1ac58cd0800c00a999270
3
+ size 6161