nemik commited on
Commit
b76bfcc
·
verified ·
1 Parent(s): a7f3f3f

frostsolutions/frost-vision-v2-google_vit-base-patch16-224

Browse files
README.md CHANGED
@@ -26,16 +26,16 @@ model-index:
26
  metrics:
27
  - name: Accuracy
28
  type: accuracy
29
- value: 0.9423188405797102
30
  - name: F1
31
  type: f1
32
- value: 0.8589652728561304
33
  - name: Precision
34
  type: precision
35
- value: 0.8795355587808418
36
  - name: Recall
37
  type: recall
38
- value: 0.8393351800554016
39
  ---
40
 
41
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -45,11 +45,11 @@ should probably proofread and complete it, then remove this comment. -->
45
 
46
  This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) on the webdataset dataset.
47
  It achieves the following results on the evaluation set:
48
- - Loss: 0.1664
49
- - Accuracy: 0.9423
50
- - F1: 0.8590
51
- - Precision: 0.8795
52
- - Recall: 0.8393
53
 
54
  ## Model description
55
 
 
26
  metrics:
27
  - name: Accuracy
28
  type: accuracy
29
+ value: 0.9359420289855073
30
  - name: F1
31
  type: f1
32
+ value: 0.8380952380952381
33
  - name: Precision
34
  type: precision
35
+ value: 0.8895800933125972
36
  - name: Recall
37
  type: recall
38
+ value: 0.7922437673130194
39
  ---
40
 
41
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
45
 
46
  This model is a fine-tuned version of [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) on the webdataset dataset.
47
  It achieves the following results on the evaluation set:
48
+ - Loss: 0.1562
49
+ - Accuracy: 0.9359
50
+ - F1: 0.8381
51
+ - Precision: 0.8896
52
+ - Recall: 0.7922
53
 
54
  ## Model description
55
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 30.0,
3
+ "eval_accuracy": 0.9359420289855073,
4
+ "eval_f1": 0.8380952380952381,
5
+ "eval_loss": 0.15622803568840027,
6
+ "eval_precision": 0.8895800933125972,
7
+ "eval_recall": 0.7922437673130194,
8
+ "eval_runtime": 3.5438,
9
+ "eval_samples_per_second": 97.353,
10
+ "eval_steps_per_second": 12.416,
11
+ "total_flos": 3.2060734740537754e+18,
12
+ "train_loss": 0.09901393407606074,
13
+ "train_runtime": 730.7896,
14
+ "train_samples_per_second": 56.61,
15
+ "train_steps_per_second": 3.571
16
+ }
eval_results.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 30.0,
3
+ "eval_accuracy": 0.9359420289855073,
4
+ "eval_f1": 0.8380952380952381,
5
+ "eval_loss": 0.15622803568840027,
6
+ "eval_precision": 0.8895800933125972,
7
+ "eval_recall": 0.7922437673130194,
8
+ "eval_runtime": 3.5438,
9
+ "eval_samples_per_second": 97.353,
10
+ "eval_steps_per_second": 12.416
11
+ }
runs/Nov18_05-40-36_a7bbef788e81/events.out.tfevents.1731909195.a7bbef788e81.210.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f388958303fa478a4bb3706b37197ac2ac93ec94389a4ce24113810635a398b
3
+ size 560
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 30.0,
3
+ "total_flos": 3.2060734740537754e+18,
4
+ "train_loss": 0.09901393407606074,
5
+ "train_runtime": 730.7896,
6
+ "train_samples_per_second": 56.61,
7
+ "train_steps_per_second": 3.571
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2181 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.15622803568840027,
3
+ "best_model_checkpoint": "frostsolutions/frost-vision-v2-google_vit-base-patch16-224/checkpoint-500",
4
+ "epoch": 30.0,
5
+ "eval_steps": 100,
6
+ "global_step": 2610,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.11494252873563218,
13
+ "grad_norm": 1.843792200088501,
14
+ "learning_rate": 1.9157088122605362e-06,
15
+ "loss": 0.7018,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.22988505747126436,
20
+ "grad_norm": 1.8868529796600342,
21
+ "learning_rate": 3.8314176245210725e-06,
22
+ "loss": 0.6818,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.3448275862068966,
27
+ "grad_norm": 1.7420289516448975,
28
+ "learning_rate": 5.747126436781609e-06,
29
+ "loss": 0.6445,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.45977011494252873,
34
+ "grad_norm": 1.434963345527649,
35
+ "learning_rate": 7.662835249042145e-06,
36
+ "loss": 0.5892,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.5747126436781609,
41
+ "grad_norm": 1.3582804203033447,
42
+ "learning_rate": 9.578544061302683e-06,
43
+ "loss": 0.5362,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.6896551724137931,
48
+ "grad_norm": 1.2787097692489624,
49
+ "learning_rate": 1.1494252873563218e-05,
50
+ "loss": 0.4723,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.8045977011494253,
55
+ "grad_norm": 1.0611391067504883,
56
+ "learning_rate": 1.3409961685823755e-05,
57
+ "loss": 0.4433,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.9195402298850575,
62
+ "grad_norm": 1.0236091613769531,
63
+ "learning_rate": 1.532567049808429e-05,
64
+ "loss": 0.415,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 1.0344827586206897,
69
+ "grad_norm": 1.0555670261383057,
70
+ "learning_rate": 1.7241379310344828e-05,
71
+ "loss": 0.3879,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 1.1494252873563218,
76
+ "grad_norm": 1.0582798719406128,
77
+ "learning_rate": 1.9157088122605367e-05,
78
+ "loss": 0.3416,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 1.1494252873563218,
83
+ "eval_accuracy": 0.8771014492753623,
84
+ "eval_f1": 0.6124314442413162,
85
+ "eval_loss": 0.32730263471603394,
86
+ "eval_precision": 0.9005376344086021,
87
+ "eval_recall": 0.46398891966759004,
88
+ "eval_runtime": 2.5472,
89
+ "eval_samples_per_second": 135.44,
90
+ "eval_steps_per_second": 17.274,
91
+ "step": 100
92
+ },
93
+ {
94
+ "epoch": 1.264367816091954,
95
+ "grad_norm": 1.0698238611221313,
96
+ "learning_rate": 2.10727969348659e-05,
97
+ "loss": 0.3371,
98
+ "step": 110
99
+ },
100
+ {
101
+ "epoch": 1.3793103448275863,
102
+ "grad_norm": 1.0518220663070679,
103
+ "learning_rate": 2.2988505747126437e-05,
104
+ "loss": 0.3122,
105
+ "step": 120
106
+ },
107
+ {
108
+ "epoch": 1.4942528735632183,
109
+ "grad_norm": 1.3425720930099487,
110
+ "learning_rate": 2.4904214559386975e-05,
111
+ "loss": 0.2861,
112
+ "step": 130
113
+ },
114
+ {
115
+ "epoch": 1.6091954022988506,
116
+ "grad_norm": 0.789143979549408,
117
+ "learning_rate": 2.681992337164751e-05,
118
+ "loss": 0.2678,
119
+ "step": 140
120
+ },
121
+ {
122
+ "epoch": 1.7241379310344827,
123
+ "grad_norm": 0.8800056576728821,
124
+ "learning_rate": 2.8735632183908045e-05,
125
+ "loss": 0.2811,
126
+ "step": 150
127
+ },
128
+ {
129
+ "epoch": 1.839080459770115,
130
+ "grad_norm": 0.943029522895813,
131
+ "learning_rate": 3.065134099616858e-05,
132
+ "loss": 0.2767,
133
+ "step": 160
134
+ },
135
+ {
136
+ "epoch": 1.9540229885057472,
137
+ "grad_norm": 1.1474237442016602,
138
+ "learning_rate": 3.256704980842912e-05,
139
+ "loss": 0.266,
140
+ "step": 170
141
+ },
142
+ {
143
+ "epoch": 2.0689655172413794,
144
+ "grad_norm": 1.0653332471847534,
145
+ "learning_rate": 3.4482758620689657e-05,
146
+ "loss": 0.2381,
147
+ "step": 180
148
+ },
149
+ {
150
+ "epoch": 2.1839080459770113,
151
+ "grad_norm": 1.0384609699249268,
152
+ "learning_rate": 3.6398467432950195e-05,
153
+ "loss": 0.2266,
154
+ "step": 190
155
+ },
156
+ {
157
+ "epoch": 2.2988505747126435,
158
+ "grad_norm": 0.7939756512641907,
159
+ "learning_rate": 3.831417624521073e-05,
160
+ "loss": 0.2215,
161
+ "step": 200
162
+ },
163
+ {
164
+ "epoch": 2.2988505747126435,
165
+ "eval_accuracy": 0.9182608695652174,
166
+ "eval_f1": 0.7901785714285714,
167
+ "eval_loss": 0.21871700882911682,
168
+ "eval_precision": 0.8536977491961415,
169
+ "eval_recall": 0.7354570637119113,
170
+ "eval_runtime": 2.9768,
171
+ "eval_samples_per_second": 115.897,
172
+ "eval_steps_per_second": 14.781,
173
+ "step": 200
174
+ },
175
+ {
176
+ "epoch": 2.413793103448276,
177
+ "grad_norm": 0.8638312816619873,
178
+ "learning_rate": 4.0229885057471265e-05,
179
+ "loss": 0.2274,
180
+ "step": 210
181
+ },
182
+ {
183
+ "epoch": 2.528735632183908,
184
+ "grad_norm": 1.0766372680664062,
185
+ "learning_rate": 4.21455938697318e-05,
186
+ "loss": 0.1967,
187
+ "step": 220
188
+ },
189
+ {
190
+ "epoch": 2.6436781609195403,
191
+ "grad_norm": 0.8049170970916748,
192
+ "learning_rate": 4.406130268199234e-05,
193
+ "loss": 0.2041,
194
+ "step": 230
195
+ },
196
+ {
197
+ "epoch": 2.7586206896551726,
198
+ "grad_norm": 0.8865797519683838,
199
+ "learning_rate": 4.597701149425287e-05,
200
+ "loss": 0.2288,
201
+ "step": 240
202
+ },
203
+ {
204
+ "epoch": 2.873563218390805,
205
+ "grad_norm": 0.906036376953125,
206
+ "learning_rate": 4.789272030651341e-05,
207
+ "loss": 0.2205,
208
+ "step": 250
209
+ },
210
+ {
211
+ "epoch": 2.9885057471264367,
212
+ "grad_norm": 1.1091452836990356,
213
+ "learning_rate": 4.980842911877395e-05,
214
+ "loss": 0.2075,
215
+ "step": 260
216
+ },
217
+ {
218
+ "epoch": 3.103448275862069,
219
+ "grad_norm": 1.1074674129486084,
220
+ "learning_rate": 4.980842911877395e-05,
221
+ "loss": 0.1914,
222
+ "step": 270
223
+ },
224
+ {
225
+ "epoch": 3.218390804597701,
226
+ "grad_norm": 1.6170417070388794,
227
+ "learning_rate": 4.959557258407833e-05,
228
+ "loss": 0.1792,
229
+ "step": 280
230
+ },
231
+ {
232
+ "epoch": 3.3333333333333335,
233
+ "grad_norm": 1.5800983905792236,
234
+ "learning_rate": 4.938271604938271e-05,
235
+ "loss": 0.1985,
236
+ "step": 290
237
+ },
238
+ {
239
+ "epoch": 3.4482758620689653,
240
+ "grad_norm": 1.2079609632492065,
241
+ "learning_rate": 4.916985951468711e-05,
242
+ "loss": 0.1753,
243
+ "step": 300
244
+ },
245
+ {
246
+ "epoch": 3.4482758620689653,
247
+ "eval_accuracy": 0.923768115942029,
248
+ "eval_f1": 0.8098336948662328,
249
+ "eval_loss": 0.1898600310087204,
250
+ "eval_precision": 0.8472012102874432,
251
+ "eval_recall": 0.775623268698061,
252
+ "eval_runtime": 3.0971,
253
+ "eval_samples_per_second": 111.395,
254
+ "eval_steps_per_second": 14.207,
255
+ "step": 300
256
+ },
257
+ {
258
+ "epoch": 3.5632183908045976,
259
+ "grad_norm": 0.70270836353302,
260
+ "learning_rate": 4.895700297999149e-05,
261
+ "loss": 0.2106,
262
+ "step": 310
263
+ },
264
+ {
265
+ "epoch": 3.67816091954023,
266
+ "grad_norm": 0.7896942496299744,
267
+ "learning_rate": 4.874414644529587e-05,
268
+ "loss": 0.168,
269
+ "step": 320
270
+ },
271
+ {
272
+ "epoch": 3.793103448275862,
273
+ "grad_norm": 1.1563091278076172,
274
+ "learning_rate": 4.853128991060026e-05,
275
+ "loss": 0.2003,
276
+ "step": 330
277
+ },
278
+ {
279
+ "epoch": 3.9080459770114944,
280
+ "grad_norm": 1.192246437072754,
281
+ "learning_rate": 4.831843337590464e-05,
282
+ "loss": 0.203,
283
+ "step": 340
284
+ },
285
+ {
286
+ "epoch": 4.022988505747127,
287
+ "grad_norm": 1.551155686378479,
288
+ "learning_rate": 4.810557684120903e-05,
289
+ "loss": 0.1936,
290
+ "step": 350
291
+ },
292
+ {
293
+ "epoch": 4.137931034482759,
294
+ "grad_norm": 1.4264142513275146,
295
+ "learning_rate": 4.789272030651341e-05,
296
+ "loss": 0.1783,
297
+ "step": 360
298
+ },
299
+ {
300
+ "epoch": 4.252873563218391,
301
+ "grad_norm": 0.6229875087738037,
302
+ "learning_rate": 4.767986377181779e-05,
303
+ "loss": 0.1797,
304
+ "step": 370
305
+ },
306
+ {
307
+ "epoch": 4.3678160919540225,
308
+ "grad_norm": 0.6375844478607178,
309
+ "learning_rate": 4.746700723712218e-05,
310
+ "loss": 0.1589,
311
+ "step": 380
312
+ },
313
+ {
314
+ "epoch": 4.482758620689655,
315
+ "grad_norm": 1.297573447227478,
316
+ "learning_rate": 4.725415070242657e-05,
317
+ "loss": 0.142,
318
+ "step": 390
319
+ },
320
+ {
321
+ "epoch": 4.597701149425287,
322
+ "grad_norm": 0.7838256359100342,
323
+ "learning_rate": 4.704129416773095e-05,
324
+ "loss": 0.1656,
325
+ "step": 400
326
+ },
327
+ {
328
+ "epoch": 4.597701149425287,
329
+ "eval_accuracy": 0.9272463768115942,
330
+ "eval_f1": 0.8174545454545454,
331
+ "eval_loss": 0.17321595549583435,
332
+ "eval_precision": 0.8606431852986217,
333
+ "eval_recall": 0.778393351800554,
334
+ "eval_runtime": 4.6227,
335
+ "eval_samples_per_second": 74.631,
336
+ "eval_steps_per_second": 9.518,
337
+ "step": 400
338
+ },
339
+ {
340
+ "epoch": 4.712643678160919,
341
+ "grad_norm": 0.8572351932525635,
342
+ "learning_rate": 4.682843763303534e-05,
343
+ "loss": 0.1843,
344
+ "step": 410
345
+ },
346
+ {
347
+ "epoch": 4.827586206896552,
348
+ "grad_norm": 0.9022108912467957,
349
+ "learning_rate": 4.661558109833972e-05,
350
+ "loss": 0.1638,
351
+ "step": 420
352
+ },
353
+ {
354
+ "epoch": 4.942528735632184,
355
+ "grad_norm": 1.1695079803466797,
356
+ "learning_rate": 4.640272456364411e-05,
357
+ "loss": 0.1447,
358
+ "step": 430
359
+ },
360
+ {
361
+ "epoch": 5.057471264367816,
362
+ "grad_norm": 0.9919353723526001,
363
+ "learning_rate": 4.618986802894849e-05,
364
+ "loss": 0.1796,
365
+ "step": 440
366
+ },
367
+ {
368
+ "epoch": 5.172413793103448,
369
+ "grad_norm": 0.9828383326530457,
370
+ "learning_rate": 4.597701149425287e-05,
371
+ "loss": 0.1376,
372
+ "step": 450
373
+ },
374
+ {
375
+ "epoch": 5.287356321839081,
376
+ "grad_norm": 0.7474183440208435,
377
+ "learning_rate": 4.576415495955726e-05,
378
+ "loss": 0.1245,
379
+ "step": 460
380
+ },
381
+ {
382
+ "epoch": 5.402298850574713,
383
+ "grad_norm": 0.9904183149337769,
384
+ "learning_rate": 4.555129842486164e-05,
385
+ "loss": 0.1531,
386
+ "step": 470
387
+ },
388
+ {
389
+ "epoch": 5.517241379310345,
390
+ "grad_norm": 0.7156611680984497,
391
+ "learning_rate": 4.5338441890166025e-05,
392
+ "loss": 0.1551,
393
+ "step": 480
394
+ },
395
+ {
396
+ "epoch": 5.6321839080459775,
397
+ "grad_norm": 0.7088082432746887,
398
+ "learning_rate": 4.512558535547042e-05,
399
+ "loss": 0.1444,
400
+ "step": 490
401
+ },
402
+ {
403
+ "epoch": 5.747126436781609,
404
+ "grad_norm": 0.8901511430740356,
405
+ "learning_rate": 4.49127288207748e-05,
406
+ "loss": 0.1288,
407
+ "step": 500
408
+ },
409
+ {
410
+ "epoch": 5.747126436781609,
411
+ "eval_accuracy": 0.9359420289855073,
412
+ "eval_f1": 0.8380952380952381,
413
+ "eval_loss": 0.15622803568840027,
414
+ "eval_precision": 0.8895800933125972,
415
+ "eval_recall": 0.7922437673130194,
416
+ "eval_runtime": 2.5149,
417
+ "eval_samples_per_second": 137.184,
418
+ "eval_steps_per_second": 17.496,
419
+ "step": 500
420
+ },
421
+ {
422
+ "epoch": 5.862068965517241,
423
+ "grad_norm": 0.8678961992263794,
424
+ "learning_rate": 4.469987228607918e-05,
425
+ "loss": 0.144,
426
+ "step": 510
427
+ },
428
+ {
429
+ "epoch": 5.977011494252873,
430
+ "grad_norm": 0.8695370554924011,
431
+ "learning_rate": 4.448701575138357e-05,
432
+ "loss": 0.1543,
433
+ "step": 520
434
+ },
435
+ {
436
+ "epoch": 6.091954022988506,
437
+ "grad_norm": 0.6334937810897827,
438
+ "learning_rate": 4.427415921668795e-05,
439
+ "loss": 0.118,
440
+ "step": 530
441
+ },
442
+ {
443
+ "epoch": 6.206896551724138,
444
+ "grad_norm": 0.8258758187294006,
445
+ "learning_rate": 4.406130268199234e-05,
446
+ "loss": 0.1127,
447
+ "step": 540
448
+ },
449
+ {
450
+ "epoch": 6.32183908045977,
451
+ "grad_norm": 1.4297465085983276,
452
+ "learning_rate": 4.384844614729672e-05,
453
+ "loss": 0.124,
454
+ "step": 550
455
+ },
456
+ {
457
+ "epoch": 6.436781609195402,
458
+ "grad_norm": 0.9284784197807312,
459
+ "learning_rate": 4.3635589612601105e-05,
460
+ "loss": 0.1153,
461
+ "step": 560
462
+ },
463
+ {
464
+ "epoch": 6.551724137931035,
465
+ "grad_norm": 0.7226102352142334,
466
+ "learning_rate": 4.342273307790549e-05,
467
+ "loss": 0.1204,
468
+ "step": 570
469
+ },
470
+ {
471
+ "epoch": 6.666666666666667,
472
+ "grad_norm": 1.2235733270645142,
473
+ "learning_rate": 4.3209876543209875e-05,
474
+ "loss": 0.1393,
475
+ "step": 580
476
+ },
477
+ {
478
+ "epoch": 6.781609195402299,
479
+ "grad_norm": 0.6442233324050903,
480
+ "learning_rate": 4.299702000851426e-05,
481
+ "loss": 0.1311,
482
+ "step": 590
483
+ },
484
+ {
485
+ "epoch": 6.896551724137931,
486
+ "grad_norm": 0.8887196183204651,
487
+ "learning_rate": 4.278416347381865e-05,
488
+ "loss": 0.1323,
489
+ "step": 600
490
+ },
491
+ {
492
+ "epoch": 6.896551724137931,
493
+ "eval_accuracy": 0.9321739130434783,
494
+ "eval_f1": 0.8326180257510729,
495
+ "eval_loss": 0.1597267985343933,
496
+ "eval_precision": 0.8609467455621301,
497
+ "eval_recall": 0.8060941828254847,
498
+ "eval_runtime": 2.5018,
499
+ "eval_samples_per_second": 137.898,
500
+ "eval_steps_per_second": 17.587,
501
+ "step": 600
502
+ },
503
+ {
504
+ "epoch": 7.011494252873563,
505
+ "grad_norm": 0.535347044467926,
506
+ "learning_rate": 4.257130693912303e-05,
507
+ "loss": 0.1151,
508
+ "step": 610
509
+ },
510
+ {
511
+ "epoch": 7.126436781609195,
512
+ "grad_norm": 1.1041584014892578,
513
+ "learning_rate": 4.235845040442742e-05,
514
+ "loss": 0.1196,
515
+ "step": 620
516
+ },
517
+ {
518
+ "epoch": 7.241379310344827,
519
+ "grad_norm": 1.098777413368225,
520
+ "learning_rate": 4.21455938697318e-05,
521
+ "loss": 0.1262,
522
+ "step": 630
523
+ },
524
+ {
525
+ "epoch": 7.35632183908046,
526
+ "grad_norm": 1.0132532119750977,
527
+ "learning_rate": 4.1932737335036185e-05,
528
+ "loss": 0.115,
529
+ "step": 640
530
+ },
531
+ {
532
+ "epoch": 7.471264367816092,
533
+ "grad_norm": 1.0600366592407227,
534
+ "learning_rate": 4.171988080034057e-05,
535
+ "loss": 0.1205,
536
+ "step": 650
537
+ },
538
+ {
539
+ "epoch": 7.586206896551724,
540
+ "grad_norm": 0.8362026810646057,
541
+ "learning_rate": 4.1507024265644955e-05,
542
+ "loss": 0.1294,
543
+ "step": 660
544
+ },
545
+ {
546
+ "epoch": 7.7011494252873565,
547
+ "grad_norm": 0.9791185855865479,
548
+ "learning_rate": 4.129416773094934e-05,
549
+ "loss": 0.1253,
550
+ "step": 670
551
+ },
552
+ {
553
+ "epoch": 7.816091954022989,
554
+ "grad_norm": 0.6780907511711121,
555
+ "learning_rate": 4.1081311196253725e-05,
556
+ "loss": 0.1072,
557
+ "step": 680
558
+ },
559
+ {
560
+ "epoch": 7.931034482758621,
561
+ "grad_norm": 0.9286981225013733,
562
+ "learning_rate": 4.086845466155811e-05,
563
+ "loss": 0.114,
564
+ "step": 690
565
+ },
566
+ {
567
+ "epoch": 8.045977011494253,
568
+ "grad_norm": 0.48207980394363403,
569
+ "learning_rate": 4.06555981268625e-05,
570
+ "loss": 0.1004,
571
+ "step": 700
572
+ },
573
+ {
574
+ "epoch": 8.045977011494253,
575
+ "eval_accuracy": 0.9315942028985508,
576
+ "eval_f1": 0.8323863636363636,
577
+ "eval_loss": 0.161317378282547,
578
+ "eval_precision": 0.8542274052478134,
579
+ "eval_recall": 0.8116343490304709,
580
+ "eval_runtime": 2.5046,
581
+ "eval_samples_per_second": 137.746,
582
+ "eval_steps_per_second": 17.568,
583
+ "step": 700
584
+ },
585
+ {
586
+ "epoch": 8.160919540229886,
587
+ "grad_norm": 1.1616382598876953,
588
+ "learning_rate": 4.044274159216688e-05,
589
+ "loss": 0.1209,
590
+ "step": 710
591
+ },
592
+ {
593
+ "epoch": 8.275862068965518,
594
+ "grad_norm": 0.7316162586212158,
595
+ "learning_rate": 4.0229885057471265e-05,
596
+ "loss": 0.096,
597
+ "step": 720
598
+ },
599
+ {
600
+ "epoch": 8.39080459770115,
601
+ "grad_norm": 0.8447398543357849,
602
+ "learning_rate": 4.001702852277565e-05,
603
+ "loss": 0.1163,
604
+ "step": 730
605
+ },
606
+ {
607
+ "epoch": 8.505747126436782,
608
+ "grad_norm": 0.6557740569114685,
609
+ "learning_rate": 3.9804171988080035e-05,
610
+ "loss": 0.0951,
611
+ "step": 740
612
+ },
613
+ {
614
+ "epoch": 8.620689655172415,
615
+ "grad_norm": 0.9132739901542664,
616
+ "learning_rate": 3.959131545338442e-05,
617
+ "loss": 0.1045,
618
+ "step": 750
619
+ },
620
+ {
621
+ "epoch": 8.735632183908045,
622
+ "grad_norm": 1.8383800983428955,
623
+ "learning_rate": 3.9378458918688805e-05,
624
+ "loss": 0.1007,
625
+ "step": 760
626
+ },
627
+ {
628
+ "epoch": 8.850574712643677,
629
+ "grad_norm": 1.6262847185134888,
630
+ "learning_rate": 3.9165602383993187e-05,
631
+ "loss": 0.0886,
632
+ "step": 770
633
+ },
634
+ {
635
+ "epoch": 8.96551724137931,
636
+ "grad_norm": 1.1226928234100342,
637
+ "learning_rate": 3.8952745849297575e-05,
638
+ "loss": 0.1136,
639
+ "step": 780
640
+ },
641
+ {
642
+ "epoch": 9.080459770114942,
643
+ "grad_norm": 0.8685470223426819,
644
+ "learning_rate": 3.873988931460196e-05,
645
+ "loss": 0.086,
646
+ "step": 790
647
+ },
648
+ {
649
+ "epoch": 9.195402298850574,
650
+ "grad_norm": 0.662932276725769,
651
+ "learning_rate": 3.8527032779906345e-05,
652
+ "loss": 0.0956,
653
+ "step": 800
654
+ },
655
+ {
656
+ "epoch": 9.195402298850574,
657
+ "eval_accuracy": 0.933623188405797,
658
+ "eval_f1": 0.8367783321454028,
659
+ "eval_loss": 0.16119635105133057,
660
+ "eval_precision": 0.8619676945668135,
661
+ "eval_recall": 0.8130193905817175,
662
+ "eval_runtime": 3.6353,
663
+ "eval_samples_per_second": 94.902,
664
+ "eval_steps_per_second": 12.103,
665
+ "step": 800
666
+ },
667
+ {
668
+ "epoch": 9.310344827586206,
669
+ "grad_norm": 0.9096585512161255,
670
+ "learning_rate": 3.831417624521073e-05,
671
+ "loss": 0.0939,
672
+ "step": 810
673
+ },
674
+ {
675
+ "epoch": 9.425287356321839,
676
+ "grad_norm": 0.6489679217338562,
677
+ "learning_rate": 3.8101319710515115e-05,
678
+ "loss": 0.0894,
679
+ "step": 820
680
+ },
681
+ {
682
+ "epoch": 9.540229885057471,
683
+ "grad_norm": 1.230733871459961,
684
+ "learning_rate": 3.7888463175819497e-05,
685
+ "loss": 0.1027,
686
+ "step": 830
687
+ },
688
+ {
689
+ "epoch": 9.655172413793103,
690
+ "grad_norm": 0.5888535976409912,
691
+ "learning_rate": 3.7675606641123885e-05,
692
+ "loss": 0.1034,
693
+ "step": 840
694
+ },
695
+ {
696
+ "epoch": 9.770114942528735,
697
+ "grad_norm": 1.329034447669983,
698
+ "learning_rate": 3.7462750106428267e-05,
699
+ "loss": 0.0785,
700
+ "step": 850
701
+ },
702
+ {
703
+ "epoch": 9.885057471264368,
704
+ "grad_norm": 0.7995342016220093,
705
+ "learning_rate": 3.7249893571732655e-05,
706
+ "loss": 0.1072,
707
+ "step": 860
708
+ },
709
+ {
710
+ "epoch": 10.0,
711
+ "grad_norm": 2.6494181156158447,
712
+ "learning_rate": 3.7037037037037037e-05,
713
+ "loss": 0.1007,
714
+ "step": 870
715
+ },
716
+ {
717
+ "epoch": 10.114942528735632,
718
+ "grad_norm": 0.9277469515800476,
719
+ "learning_rate": 3.682418050234142e-05,
720
+ "loss": 0.0835,
721
+ "step": 880
722
+ },
723
+ {
724
+ "epoch": 10.229885057471265,
725
+ "grad_norm": 0.8300806879997253,
726
+ "learning_rate": 3.661132396764581e-05,
727
+ "loss": 0.0853,
728
+ "step": 890
729
+ },
730
+ {
731
+ "epoch": 10.344827586206897,
732
+ "grad_norm": 0.49981820583343506,
733
+ "learning_rate": 3.6398467432950195e-05,
734
+ "loss": 0.0841,
735
+ "step": 900
736
+ },
737
+ {
738
+ "epoch": 10.344827586206897,
739
+ "eval_accuracy": 0.9344927536231884,
740
+ "eval_f1": 0.8383404864091559,
741
+ "eval_loss": 0.16212987899780273,
742
+ "eval_precision": 0.8668639053254438,
743
+ "eval_recall": 0.8116343490304709,
744
+ "eval_runtime": 2.4973,
745
+ "eval_samples_per_second": 138.151,
746
+ "eval_steps_per_second": 17.619,
747
+ "step": 900
748
+ },
749
+ {
750
+ "epoch": 10.459770114942529,
751
+ "grad_norm": 1.2328234910964966,
752
+ "learning_rate": 3.6185610898254577e-05,
753
+ "loss": 0.0825,
754
+ "step": 910
755
+ },
756
+ {
757
+ "epoch": 10.574712643678161,
758
+ "grad_norm": 0.48059162497520447,
759
+ "learning_rate": 3.5972754363558965e-05,
760
+ "loss": 0.0765,
761
+ "step": 920
762
+ },
763
+ {
764
+ "epoch": 10.689655172413794,
765
+ "grad_norm": 1.0247180461883545,
766
+ "learning_rate": 3.5759897828863347e-05,
767
+ "loss": 0.0876,
768
+ "step": 930
769
+ },
770
+ {
771
+ "epoch": 10.804597701149426,
772
+ "grad_norm": 0.6012043356895447,
773
+ "learning_rate": 3.5547041294167735e-05,
774
+ "loss": 0.08,
775
+ "step": 940
776
+ },
777
+ {
778
+ "epoch": 10.919540229885058,
779
+ "grad_norm": 0.8350706696510315,
780
+ "learning_rate": 3.5334184759472117e-05,
781
+ "loss": 0.0889,
782
+ "step": 950
783
+ },
784
+ {
785
+ "epoch": 11.03448275862069,
786
+ "grad_norm": 0.46032950282096863,
787
+ "learning_rate": 3.51213282247765e-05,
788
+ "loss": 0.0815,
789
+ "step": 960
790
+ },
791
+ {
792
+ "epoch": 11.149425287356323,
793
+ "grad_norm": 0.8548628687858582,
794
+ "learning_rate": 3.4908471690080887e-05,
795
+ "loss": 0.09,
796
+ "step": 970
797
+ },
798
+ {
799
+ "epoch": 11.264367816091955,
800
+ "grad_norm": 0.9966102242469788,
801
+ "learning_rate": 3.469561515538527e-05,
802
+ "loss": 0.0707,
803
+ "step": 980
804
+ },
805
+ {
806
+ "epoch": 11.379310344827585,
807
+ "grad_norm": 1.062538981437683,
808
+ "learning_rate": 3.4482758620689657e-05,
809
+ "loss": 0.0652,
810
+ "step": 990
811
+ },
812
+ {
813
+ "epoch": 11.494252873563218,
814
+ "grad_norm": 0.6467556953430176,
815
+ "learning_rate": 3.4269902085994045e-05,
816
+ "loss": 0.0764,
817
+ "step": 1000
818
+ },
819
+ {
820
+ "epoch": 11.494252873563218,
821
+ "eval_accuracy": 0.9359420289855073,
822
+ "eval_f1": 0.8438162544169612,
823
+ "eval_loss": 0.15857626497745514,
824
+ "eval_precision": 0.8614718614718615,
825
+ "eval_recall": 0.8268698060941828,
826
+ "eval_runtime": 2.4932,
827
+ "eval_samples_per_second": 138.379,
828
+ "eval_steps_per_second": 17.648,
829
+ "step": 1000
830
+ },
831
+ {
832
+ "epoch": 11.60919540229885,
833
+ "grad_norm": 0.9217528700828552,
834
+ "learning_rate": 3.4057045551298427e-05,
835
+ "loss": 0.0884,
836
+ "step": 1010
837
+ },
838
+ {
839
+ "epoch": 11.724137931034482,
840
+ "grad_norm": 0.9330977201461792,
841
+ "learning_rate": 3.3844189016602815e-05,
842
+ "loss": 0.0869,
843
+ "step": 1020
844
+ },
845
+ {
846
+ "epoch": 11.839080459770114,
847
+ "grad_norm": 1.4273611307144165,
848
+ "learning_rate": 3.3631332481907197e-05,
849
+ "loss": 0.0846,
850
+ "step": 1030
851
+ },
852
+ {
853
+ "epoch": 11.954022988505747,
854
+ "grad_norm": 1.0101591348648071,
855
+ "learning_rate": 3.341847594721158e-05,
856
+ "loss": 0.0773,
857
+ "step": 1040
858
+ },
859
+ {
860
+ "epoch": 12.068965517241379,
861
+ "grad_norm": 1.1758484840393066,
862
+ "learning_rate": 3.3205619412515967e-05,
863
+ "loss": 0.0744,
864
+ "step": 1050
865
+ },
866
+ {
867
+ "epoch": 12.183908045977011,
868
+ "grad_norm": 0.8224564790725708,
869
+ "learning_rate": 3.299276287782035e-05,
870
+ "loss": 0.0795,
871
+ "step": 1060
872
+ },
873
+ {
874
+ "epoch": 12.298850574712644,
875
+ "grad_norm": 0.468171626329422,
876
+ "learning_rate": 3.277990634312474e-05,
877
+ "loss": 0.0637,
878
+ "step": 1070
879
+ },
880
+ {
881
+ "epoch": 12.413793103448276,
882
+ "grad_norm": 1.1701269149780273,
883
+ "learning_rate": 3.256704980842912e-05,
884
+ "loss": 0.0689,
885
+ "step": 1080
886
+ },
887
+ {
888
+ "epoch": 12.528735632183908,
889
+ "grad_norm": 1.3815059661865234,
890
+ "learning_rate": 3.235419327373351e-05,
891
+ "loss": 0.0861,
892
+ "step": 1090
893
+ },
894
+ {
895
+ "epoch": 12.64367816091954,
896
+ "grad_norm": 1.2272127866744995,
897
+ "learning_rate": 3.2141336739037895e-05,
898
+ "loss": 0.0726,
899
+ "step": 1100
900
+ },
901
+ {
902
+ "epoch": 12.64367816091954,
903
+ "eval_accuracy": 0.9420289855072463,
904
+ "eval_f1": 0.8593530239099859,
905
+ "eval_loss": 0.15456102788448334,
906
+ "eval_precision": 0.8728571428571429,
907
+ "eval_recall": 0.8462603878116344,
908
+ "eval_runtime": 2.5301,
909
+ "eval_samples_per_second": 136.358,
910
+ "eval_steps_per_second": 17.391,
911
+ "step": 1100
912
+ },
913
+ {
914
+ "epoch": 12.758620689655173,
915
+ "grad_norm": 0.9871719479560852,
916
+ "learning_rate": 3.192848020434228e-05,
917
+ "loss": 0.0683,
918
+ "step": 1110
919
+ },
920
+ {
921
+ "epoch": 12.873563218390805,
922
+ "grad_norm": 0.49420595169067383,
923
+ "learning_rate": 3.171562366964666e-05,
924
+ "loss": 0.0721,
925
+ "step": 1120
926
+ },
927
+ {
928
+ "epoch": 12.988505747126437,
929
+ "grad_norm": 0.25016915798187256,
930
+ "learning_rate": 3.150276713495105e-05,
931
+ "loss": 0.0785,
932
+ "step": 1130
933
+ },
934
+ {
935
+ "epoch": 13.10344827586207,
936
+ "grad_norm": 1.1972845792770386,
937
+ "learning_rate": 3.128991060025543e-05,
938
+ "loss": 0.0732,
939
+ "step": 1140
940
+ },
941
+ {
942
+ "epoch": 13.218390804597702,
943
+ "grad_norm": 0.6661795973777771,
944
+ "learning_rate": 3.107705406555981e-05,
945
+ "loss": 0.0709,
946
+ "step": 1150
947
+ },
948
+ {
949
+ "epoch": 13.333333333333334,
950
+ "grad_norm": 0.5791137218475342,
951
+ "learning_rate": 3.08641975308642e-05,
952
+ "loss": 0.0805,
953
+ "step": 1160
954
+ },
955
+ {
956
+ "epoch": 13.448275862068966,
957
+ "grad_norm": 0.6590031385421753,
958
+ "learning_rate": 3.065134099616858e-05,
959
+ "loss": 0.0665,
960
+ "step": 1170
961
+ },
962
+ {
963
+ "epoch": 13.563218390804598,
964
+ "grad_norm": 0.5040526986122131,
965
+ "learning_rate": 3.043848446147297e-05,
966
+ "loss": 0.0766,
967
+ "step": 1180
968
+ },
969
+ {
970
+ "epoch": 13.678160919540229,
971
+ "grad_norm": 0.785590648651123,
972
+ "learning_rate": 3.0225627926777357e-05,
973
+ "loss": 0.0557,
974
+ "step": 1190
975
+ },
976
+ {
977
+ "epoch": 13.793103448275861,
978
+ "grad_norm": 0.7830068469047546,
979
+ "learning_rate": 3.0012771392081738e-05,
980
+ "loss": 0.0732,
981
+ "step": 1200
982
+ },
983
+ {
984
+ "epoch": 13.793103448275861,
985
+ "eval_accuracy": 0.9408695652173913,
986
+ "eval_f1": 0.8565400843881856,
987
+ "eval_loss": 0.15285652875900269,
988
+ "eval_precision": 0.87,
989
+ "eval_recall": 0.8434903047091413,
990
+ "eval_runtime": 2.5569,
991
+ "eval_samples_per_second": 134.928,
992
+ "eval_steps_per_second": 17.208,
993
+ "step": 1200
994
+ },
995
+ {
996
+ "epoch": 13.908045977011493,
997
+ "grad_norm": 0.5470532178878784,
998
+ "learning_rate": 2.9799914857386123e-05,
999
+ "loss": 0.0665,
1000
+ "step": 1210
1001
+ },
1002
+ {
1003
+ "epoch": 14.022988505747126,
1004
+ "grad_norm": 0.9854804873466492,
1005
+ "learning_rate": 2.9587058322690508e-05,
1006
+ "loss": 0.063,
1007
+ "step": 1220
1008
+ },
1009
+ {
1010
+ "epoch": 14.137931034482758,
1011
+ "grad_norm": 0.562602162361145,
1012
+ "learning_rate": 2.9374201787994893e-05,
1013
+ "loss": 0.0683,
1014
+ "step": 1230
1015
+ },
1016
+ {
1017
+ "epoch": 14.25287356321839,
1018
+ "grad_norm": 0.5807965993881226,
1019
+ "learning_rate": 2.9161345253299278e-05,
1020
+ "loss": 0.0695,
1021
+ "step": 1240
1022
+ },
1023
+ {
1024
+ "epoch": 14.367816091954023,
1025
+ "grad_norm": 0.659021258354187,
1026
+ "learning_rate": 2.894848871860366e-05,
1027
+ "loss": 0.069,
1028
+ "step": 1250
1029
+ },
1030
+ {
1031
+ "epoch": 14.482758620689655,
1032
+ "grad_norm": 0.42295771837234497,
1033
+ "learning_rate": 2.8735632183908045e-05,
1034
+ "loss": 0.0692,
1035
+ "step": 1260
1036
+ },
1037
+ {
1038
+ "epoch": 14.597701149425287,
1039
+ "grad_norm": 1.1786874532699585,
1040
+ "learning_rate": 2.852277564921243e-05,
1041
+ "loss": 0.0564,
1042
+ "step": 1270
1043
+ },
1044
+ {
1045
+ "epoch": 14.71264367816092,
1046
+ "grad_norm": 0.7675245404243469,
1047
+ "learning_rate": 2.8309919114516818e-05,
1048
+ "loss": 0.063,
1049
+ "step": 1280
1050
+ },
1051
+ {
1052
+ "epoch": 14.827586206896552,
1053
+ "grad_norm": 0.6836528182029724,
1054
+ "learning_rate": 2.8097062579821203e-05,
1055
+ "loss": 0.0711,
1056
+ "step": 1290
1057
+ },
1058
+ {
1059
+ "epoch": 14.942528735632184,
1060
+ "grad_norm": 0.6420086026191711,
1061
+ "learning_rate": 2.7884206045125588e-05,
1062
+ "loss": 0.0626,
1063
+ "step": 1300
1064
+ },
1065
+ {
1066
+ "epoch": 14.942528735632184,
1067
+ "eval_accuracy": 0.9376811594202898,
1068
+ "eval_f1": 0.8484848484848485,
1069
+ "eval_loss": 0.1589481383562088,
1070
+ "eval_precision": 0.8637015781922525,
1071
+ "eval_recall": 0.8337950138504155,
1072
+ "eval_runtime": 2.7864,
1073
+ "eval_samples_per_second": 123.815,
1074
+ "eval_steps_per_second": 15.791,
1075
+ "step": 1300
1076
+ },
1077
+ {
1078
+ "epoch": 15.057471264367816,
1079
+ "grad_norm": 0.5504136681556702,
1080
+ "learning_rate": 2.7671349510429973e-05,
1081
+ "loss": 0.0451,
1082
+ "step": 1310
1083
+ },
1084
+ {
1085
+ "epoch": 15.172413793103448,
1086
+ "grad_norm": 1.0383538007736206,
1087
+ "learning_rate": 2.745849297573436e-05,
1088
+ "loss": 0.0569,
1089
+ "step": 1320
1090
+ },
1091
+ {
1092
+ "epoch": 15.28735632183908,
1093
+ "grad_norm": 0.42968350648880005,
1094
+ "learning_rate": 2.724563644103874e-05,
1095
+ "loss": 0.0551,
1096
+ "step": 1330
1097
+ },
1098
+ {
1099
+ "epoch": 15.402298850574713,
1100
+ "grad_norm": 0.8487522602081299,
1101
+ "learning_rate": 2.7032779906343125e-05,
1102
+ "loss": 0.0501,
1103
+ "step": 1340
1104
+ },
1105
+ {
1106
+ "epoch": 15.517241379310345,
1107
+ "grad_norm": 0.8882860541343689,
1108
+ "learning_rate": 2.681992337164751e-05,
1109
+ "loss": 0.0467,
1110
+ "step": 1350
1111
+ },
1112
+ {
1113
+ "epoch": 15.632183908045977,
1114
+ "grad_norm": 1.2020477056503296,
1115
+ "learning_rate": 2.6607066836951895e-05,
1116
+ "loss": 0.0649,
1117
+ "step": 1360
1118
+ },
1119
+ {
1120
+ "epoch": 15.74712643678161,
1121
+ "grad_norm": 0.6664167046546936,
1122
+ "learning_rate": 2.6394210302256277e-05,
1123
+ "loss": 0.0462,
1124
+ "step": 1370
1125
+ },
1126
+ {
1127
+ "epoch": 15.862068965517242,
1128
+ "grad_norm": 0.6712772250175476,
1129
+ "learning_rate": 2.618135376756067e-05,
1130
+ "loss": 0.0729,
1131
+ "step": 1380
1132
+ },
1133
+ {
1134
+ "epoch": 15.977011494252874,
1135
+ "grad_norm": 0.5807361602783203,
1136
+ "learning_rate": 2.5968497232865053e-05,
1137
+ "loss": 0.0532,
1138
+ "step": 1390
1139
+ },
1140
+ {
1141
+ "epoch": 16.091954022988507,
1142
+ "grad_norm": 0.4083256423473358,
1143
+ "learning_rate": 2.5755640698169435e-05,
1144
+ "loss": 0.0481,
1145
+ "step": 1400
1146
+ },
1147
+ {
1148
+ "epoch": 16.091954022988507,
1149
+ "eval_accuracy": 0.9394202898550724,
1150
+ "eval_f1": 0.8510334996436208,
1151
+ "eval_loss": 0.16117151081562042,
1152
+ "eval_precision": 0.8766519823788547,
1153
+ "eval_recall": 0.8268698060941828,
1154
+ "eval_runtime": 3.3334,
1155
+ "eval_samples_per_second": 103.498,
1156
+ "eval_steps_per_second": 13.2,
1157
+ "step": 1400
1158
+ },
1159
+ {
1160
+ "epoch": 16.20689655172414,
1161
+ "grad_norm": 0.8922817707061768,
1162
+ "learning_rate": 2.554278416347382e-05,
1163
+ "loss": 0.0674,
1164
+ "step": 1410
1165
+ },
1166
+ {
1167
+ "epoch": 16.32183908045977,
1168
+ "grad_norm": 0.9631970524787903,
1169
+ "learning_rate": 2.5329927628778205e-05,
1170
+ "loss": 0.0503,
1171
+ "step": 1420
1172
+ },
1173
+ {
1174
+ "epoch": 16.436781609195403,
1175
+ "grad_norm": 0.8879241943359375,
1176
+ "learning_rate": 2.511707109408259e-05,
1177
+ "loss": 0.0482,
1178
+ "step": 1430
1179
+ },
1180
+ {
1181
+ "epoch": 16.551724137931036,
1182
+ "grad_norm": 0.7775533199310303,
1183
+ "learning_rate": 2.4904214559386975e-05,
1184
+ "loss": 0.0636,
1185
+ "step": 1440
1186
+ },
1187
+ {
1188
+ "epoch": 16.666666666666668,
1189
+ "grad_norm": 0.9835919737815857,
1190
+ "learning_rate": 2.4691358024691357e-05,
1191
+ "loss": 0.0568,
1192
+ "step": 1450
1193
+ },
1194
+ {
1195
+ "epoch": 16.7816091954023,
1196
+ "grad_norm": 0.7925294041633606,
1197
+ "learning_rate": 2.4478501489995745e-05,
1198
+ "loss": 0.0529,
1199
+ "step": 1460
1200
+ },
1201
+ {
1202
+ "epoch": 16.896551724137932,
1203
+ "grad_norm": 0.6245427131652832,
1204
+ "learning_rate": 2.426564495530013e-05,
1205
+ "loss": 0.0641,
1206
+ "step": 1470
1207
+ },
1208
+ {
1209
+ "epoch": 17.011494252873565,
1210
+ "grad_norm": 0.5181954503059387,
1211
+ "learning_rate": 2.4052788420604515e-05,
1212
+ "loss": 0.046,
1213
+ "step": 1480
1214
+ },
1215
+ {
1216
+ "epoch": 17.126436781609197,
1217
+ "grad_norm": 0.40600207448005676,
1218
+ "learning_rate": 2.3839931885908897e-05,
1219
+ "loss": 0.039,
1220
+ "step": 1490
1221
+ },
1222
+ {
1223
+ "epoch": 17.24137931034483,
1224
+ "grad_norm": 0.7081565260887146,
1225
+ "learning_rate": 2.3627075351213285e-05,
1226
+ "loss": 0.0507,
1227
+ "step": 1500
1228
+ },
1229
+ {
1230
+ "epoch": 17.24137931034483,
1231
+ "eval_accuracy": 0.9339130434782609,
1232
+ "eval_f1": 0.8394366197183099,
1233
+ "eval_loss": 0.1679152399301529,
1234
+ "eval_precision": 0.8538681948424068,
1235
+ "eval_recall": 0.8254847645429363,
1236
+ "eval_runtime": 2.512,
1237
+ "eval_samples_per_second": 137.339,
1238
+ "eval_steps_per_second": 17.516,
1239
+ "step": 1500
1240
+ },
1241
+ {
1242
+ "epoch": 17.35632183908046,
1243
+ "grad_norm": 0.9278731346130371,
1244
+ "learning_rate": 2.341421881651767e-05,
1245
+ "loss": 0.0546,
1246
+ "step": 1510
1247
+ },
1248
+ {
1249
+ "epoch": 17.47126436781609,
1250
+ "grad_norm": 1.362691044807434,
1251
+ "learning_rate": 2.3201362281822055e-05,
1252
+ "loss": 0.0729,
1253
+ "step": 1520
1254
+ },
1255
+ {
1256
+ "epoch": 17.586206896551722,
1257
+ "grad_norm": 0.6156861782073975,
1258
+ "learning_rate": 2.2988505747126437e-05,
1259
+ "loss": 0.0587,
1260
+ "step": 1530
1261
+ },
1262
+ {
1263
+ "epoch": 17.701149425287355,
1264
+ "grad_norm": 0.530103862285614,
1265
+ "learning_rate": 2.277564921243082e-05,
1266
+ "loss": 0.0488,
1267
+ "step": 1540
1268
+ },
1269
+ {
1270
+ "epoch": 17.816091954022987,
1271
+ "grad_norm": 0.6204285025596619,
1272
+ "learning_rate": 2.256279267773521e-05,
1273
+ "loss": 0.0499,
1274
+ "step": 1550
1275
+ },
1276
+ {
1277
+ "epoch": 17.93103448275862,
1278
+ "grad_norm": 0.4868471920490265,
1279
+ "learning_rate": 2.234993614303959e-05,
1280
+ "loss": 0.0536,
1281
+ "step": 1560
1282
+ },
1283
+ {
1284
+ "epoch": 18.04597701149425,
1285
+ "grad_norm": 0.5951109528541565,
1286
+ "learning_rate": 2.2137079608343977e-05,
1287
+ "loss": 0.0436,
1288
+ "step": 1570
1289
+ },
1290
+ {
1291
+ "epoch": 18.160919540229884,
1292
+ "grad_norm": 1.3129435777664185,
1293
+ "learning_rate": 2.192422307364836e-05,
1294
+ "loss": 0.0455,
1295
+ "step": 1580
1296
+ },
1297
+ {
1298
+ "epoch": 18.275862068965516,
1299
+ "grad_norm": 0.4130817651748657,
1300
+ "learning_rate": 2.1711366538952747e-05,
1301
+ "loss": 0.0539,
1302
+ "step": 1590
1303
+ },
1304
+ {
1305
+ "epoch": 18.39080459770115,
1306
+ "grad_norm": 0.6867007613182068,
1307
+ "learning_rate": 2.149851000425713e-05,
1308
+ "loss": 0.0446,
1309
+ "step": 1600
1310
+ },
1311
+ {
1312
+ "epoch": 18.39080459770115,
1313
+ "eval_accuracy": 0.9417391304347826,
1314
+ "eval_f1": 0.8597348220516399,
1315
+ "eval_loss": 0.16227415204048157,
1316
+ "eval_precision": 0.8663853727144867,
1317
+ "eval_recall": 0.853185595567867,
1318
+ "eval_runtime": 2.7697,
1319
+ "eval_samples_per_second": 124.564,
1320
+ "eval_steps_per_second": 15.886,
1321
+ "step": 1600
1322
+ },
1323
+ {
1324
+ "epoch": 18.50574712643678,
1325
+ "grad_norm": 0.44690120220184326,
1326
+ "learning_rate": 2.1285653469561517e-05,
1327
+ "loss": 0.0411,
1328
+ "step": 1610
1329
+ },
1330
+ {
1331
+ "epoch": 18.620689655172413,
1332
+ "grad_norm": 0.47931063175201416,
1333
+ "learning_rate": 2.10727969348659e-05,
1334
+ "loss": 0.0487,
1335
+ "step": 1620
1336
+ },
1337
+ {
1338
+ "epoch": 18.735632183908045,
1339
+ "grad_norm": 0.5977224707603455,
1340
+ "learning_rate": 2.0859940400170287e-05,
1341
+ "loss": 0.0501,
1342
+ "step": 1630
1343
+ },
1344
+ {
1345
+ "epoch": 18.850574712643677,
1346
+ "grad_norm": 0.5454962253570557,
1347
+ "learning_rate": 2.064708386547467e-05,
1348
+ "loss": 0.0516,
1349
+ "step": 1640
1350
+ },
1351
+ {
1352
+ "epoch": 18.96551724137931,
1353
+ "grad_norm": 0.5072124004364014,
1354
+ "learning_rate": 2.0434227330779057e-05,
1355
+ "loss": 0.0407,
1356
+ "step": 1650
1357
+ },
1358
+ {
1359
+ "epoch": 19.080459770114942,
1360
+ "grad_norm": 0.8368410468101501,
1361
+ "learning_rate": 2.022137079608344e-05,
1362
+ "loss": 0.0481,
1363
+ "step": 1660
1364
+ },
1365
+ {
1366
+ "epoch": 19.195402298850574,
1367
+ "grad_norm": 1.143817663192749,
1368
+ "learning_rate": 2.0008514261387827e-05,
1369
+ "loss": 0.0406,
1370
+ "step": 1670
1371
+ },
1372
+ {
1373
+ "epoch": 19.310344827586206,
1374
+ "grad_norm": 0.8621155023574829,
1375
+ "learning_rate": 1.979565772669221e-05,
1376
+ "loss": 0.0472,
1377
+ "step": 1680
1378
+ },
1379
+ {
1380
+ "epoch": 19.42528735632184,
1381
+ "grad_norm": 0.8814654350280762,
1382
+ "learning_rate": 1.9582801191996593e-05,
1383
+ "loss": 0.0403,
1384
+ "step": 1690
1385
+ },
1386
+ {
1387
+ "epoch": 19.54022988505747,
1388
+ "grad_norm": 0.37576496601104736,
1389
+ "learning_rate": 1.936994465730098e-05,
1390
+ "loss": 0.0498,
1391
+ "step": 1700
1392
+ },
1393
+ {
1394
+ "epoch": 19.54022988505747,
1395
+ "eval_accuracy": 0.9417391304347826,
1396
+ "eval_f1": 0.860125260960334,
1397
+ "eval_loss": 0.16253642737865448,
1398
+ "eval_precision": 0.8643356643356643,
1399
+ "eval_recall": 0.8559556786703602,
1400
+ "eval_runtime": 3.3471,
1401
+ "eval_samples_per_second": 103.074,
1402
+ "eval_steps_per_second": 13.146,
1403
+ "step": 1700
1404
+ },
1405
+ {
1406
+ "epoch": 19.655172413793103,
1407
+ "grad_norm": 1.052512764930725,
1408
+ "learning_rate": 1.9157088122605367e-05,
1409
+ "loss": 0.0569,
1410
+ "step": 1710
1411
+ },
1412
+ {
1413
+ "epoch": 19.770114942528735,
1414
+ "grad_norm": 0.47423890233039856,
1415
+ "learning_rate": 1.8944231587909748e-05,
1416
+ "loss": 0.039,
1417
+ "step": 1720
1418
+ },
1419
+ {
1420
+ "epoch": 19.885057471264368,
1421
+ "grad_norm": 0.922591507434845,
1422
+ "learning_rate": 1.8731375053214133e-05,
1423
+ "loss": 0.0376,
1424
+ "step": 1730
1425
+ },
1426
+ {
1427
+ "epoch": 20.0,
1428
+ "grad_norm": 1.3589046001434326,
1429
+ "learning_rate": 1.8518518518518518e-05,
1430
+ "loss": 0.0496,
1431
+ "step": 1740
1432
+ },
1433
+ {
1434
+ "epoch": 20.114942528735632,
1435
+ "grad_norm": 0.8147189617156982,
1436
+ "learning_rate": 1.8305661983822907e-05,
1437
+ "loss": 0.0355,
1438
+ "step": 1750
1439
+ },
1440
+ {
1441
+ "epoch": 20.229885057471265,
1442
+ "grad_norm": 0.2483612447977066,
1443
+ "learning_rate": 1.8092805449127288e-05,
1444
+ "loss": 0.0432,
1445
+ "step": 1760
1446
+ },
1447
+ {
1448
+ "epoch": 20.344827586206897,
1449
+ "grad_norm": 0.29129087924957275,
1450
+ "learning_rate": 1.7879948914431673e-05,
1451
+ "loss": 0.0394,
1452
+ "step": 1770
1453
+ },
1454
+ {
1455
+ "epoch": 20.45977011494253,
1456
+ "grad_norm": 0.6358793377876282,
1457
+ "learning_rate": 1.7667092379736058e-05,
1458
+ "loss": 0.0453,
1459
+ "step": 1780
1460
+ },
1461
+ {
1462
+ "epoch": 20.57471264367816,
1463
+ "grad_norm": 0.3951164782047272,
1464
+ "learning_rate": 1.7454235845040443e-05,
1465
+ "loss": 0.0499,
1466
+ "step": 1790
1467
+ },
1468
+ {
1469
+ "epoch": 20.689655172413794,
1470
+ "grad_norm": 0.5246742367744446,
1471
+ "learning_rate": 1.7241379310344828e-05,
1472
+ "loss": 0.0458,
1473
+ "step": 1800
1474
+ },
1475
+ {
1476
+ "epoch": 20.689655172413794,
1477
+ "eval_accuracy": 0.9397101449275362,
1478
+ "eval_f1": 0.8533145275035261,
1479
+ "eval_loss": 0.16005827486515045,
1480
+ "eval_precision": 0.8692528735632183,
1481
+ "eval_recall": 0.8379501385041551,
1482
+ "eval_runtime": 2.5451,
1483
+ "eval_samples_per_second": 135.555,
1484
+ "eval_steps_per_second": 17.288,
1485
+ "step": 1800
1486
+ },
1487
+ {
1488
+ "epoch": 20.804597701149426,
1489
+ "grad_norm": 0.4585019648075104,
1490
+ "learning_rate": 1.7028522775649213e-05,
1491
+ "loss": 0.0466,
1492
+ "step": 1810
1493
+ },
1494
+ {
1495
+ "epoch": 20.919540229885058,
1496
+ "grad_norm": 0.5193343162536621,
1497
+ "learning_rate": 1.6815666240953598e-05,
1498
+ "loss": 0.0321,
1499
+ "step": 1820
1500
+ },
1501
+ {
1502
+ "epoch": 21.03448275862069,
1503
+ "grad_norm": 0.3044123649597168,
1504
+ "learning_rate": 1.6602809706257983e-05,
1505
+ "loss": 0.0391,
1506
+ "step": 1830
1507
+ },
1508
+ {
1509
+ "epoch": 21.149425287356323,
1510
+ "grad_norm": 0.3363288342952728,
1511
+ "learning_rate": 1.638995317156237e-05,
1512
+ "loss": 0.0344,
1513
+ "step": 1840
1514
+ },
1515
+ {
1516
+ "epoch": 21.264367816091955,
1517
+ "grad_norm": 0.38722899556159973,
1518
+ "learning_rate": 1.6177096636866753e-05,
1519
+ "loss": 0.0352,
1520
+ "step": 1850
1521
+ },
1522
+ {
1523
+ "epoch": 21.379310344827587,
1524
+ "grad_norm": 0.5276902914047241,
1525
+ "learning_rate": 1.596424010217114e-05,
1526
+ "loss": 0.0354,
1527
+ "step": 1860
1528
+ },
1529
+ {
1530
+ "epoch": 21.49425287356322,
1531
+ "grad_norm": 0.37170952558517456,
1532
+ "learning_rate": 1.5751383567475523e-05,
1533
+ "loss": 0.0369,
1534
+ "step": 1870
1535
+ },
1536
+ {
1537
+ "epoch": 21.60919540229885,
1538
+ "grad_norm": 1.0527151823043823,
1539
+ "learning_rate": 1.5538527032779905e-05,
1540
+ "loss": 0.032,
1541
+ "step": 1880
1542
+ },
1543
+ {
1544
+ "epoch": 21.724137931034484,
1545
+ "grad_norm": 0.42286577820777893,
1546
+ "learning_rate": 1.532567049808429e-05,
1547
+ "loss": 0.0417,
1548
+ "step": 1890
1549
+ },
1550
+ {
1551
+ "epoch": 21.839080459770116,
1552
+ "grad_norm": 0.4092840254306793,
1553
+ "learning_rate": 1.5112813963388678e-05,
1554
+ "loss": 0.0307,
1555
+ "step": 1900
1556
+ },
1557
+ {
1558
+ "epoch": 21.839080459770116,
1559
+ "eval_accuracy": 0.9431884057971015,
1560
+ "eval_f1": 0.8636995827538247,
1561
+ "eval_loss": 0.16264864802360535,
1562
+ "eval_precision": 0.86731843575419,
1563
+ "eval_recall": 0.8601108033240997,
1564
+ "eval_runtime": 2.5207,
1565
+ "eval_samples_per_second": 136.867,
1566
+ "eval_steps_per_second": 17.455,
1567
+ "step": 1900
1568
+ },
1569
+ {
1570
+ "epoch": 21.95402298850575,
1571
+ "grad_norm": 0.4764251708984375,
1572
+ "learning_rate": 1.4899957428693062e-05,
1573
+ "loss": 0.042,
1574
+ "step": 1910
1575
+ },
1576
+ {
1577
+ "epoch": 22.06896551724138,
1578
+ "grad_norm": 0.925857424736023,
1579
+ "learning_rate": 1.4687100893997447e-05,
1580
+ "loss": 0.0342,
1581
+ "step": 1920
1582
+ },
1583
+ {
1584
+ "epoch": 22.183908045977013,
1585
+ "grad_norm": 0.6081854701042175,
1586
+ "learning_rate": 1.447424435930183e-05,
1587
+ "loss": 0.0443,
1588
+ "step": 1930
1589
+ },
1590
+ {
1591
+ "epoch": 22.298850574712645,
1592
+ "grad_norm": 0.35534176230430603,
1593
+ "learning_rate": 1.4261387824606215e-05,
1594
+ "loss": 0.0296,
1595
+ "step": 1940
1596
+ },
1597
+ {
1598
+ "epoch": 22.413793103448278,
1599
+ "grad_norm": 0.5677205324172974,
1600
+ "learning_rate": 1.4048531289910602e-05,
1601
+ "loss": 0.0319,
1602
+ "step": 1950
1603
+ },
1604
+ {
1605
+ "epoch": 22.52873563218391,
1606
+ "grad_norm": 0.9339087009429932,
1607
+ "learning_rate": 1.3835674755214987e-05,
1608
+ "loss": 0.0367,
1609
+ "step": 1960
1610
+ },
1611
+ {
1612
+ "epoch": 22.64367816091954,
1613
+ "grad_norm": 0.30769288539886475,
1614
+ "learning_rate": 1.362281822051937e-05,
1615
+ "loss": 0.0372,
1616
+ "step": 1970
1617
+ },
1618
+ {
1619
+ "epoch": 22.75862068965517,
1620
+ "grad_norm": 0.4518299698829651,
1621
+ "learning_rate": 1.3409961685823755e-05,
1622
+ "loss": 0.0501,
1623
+ "step": 1980
1624
+ },
1625
+ {
1626
+ "epoch": 22.873563218390803,
1627
+ "grad_norm": 0.3183022737503052,
1628
+ "learning_rate": 1.3197105151128138e-05,
1629
+ "loss": 0.0338,
1630
+ "step": 1990
1631
+ },
1632
+ {
1633
+ "epoch": 22.988505747126435,
1634
+ "grad_norm": 0.7140583395957947,
1635
+ "learning_rate": 1.2984248616432527e-05,
1636
+ "loss": 0.0334,
1637
+ "step": 2000
1638
+ },
1639
+ {
1640
+ "epoch": 22.988505747126435,
1641
+ "eval_accuracy": 0.9443478260869566,
1642
+ "eval_f1": 0.8642149929278642,
1643
+ "eval_loss": 0.16209261119365692,
1644
+ "eval_precision": 0.8829479768786127,
1645
+ "eval_recall": 0.8462603878116344,
1646
+ "eval_runtime": 3.2664,
1647
+ "eval_samples_per_second": 105.621,
1648
+ "eval_steps_per_second": 13.471,
1649
+ "step": 2000
1650
+ },
1651
+ {
1652
+ "epoch": 23.103448275862068,
1653
+ "grad_norm": 0.647758424282074,
1654
+ "learning_rate": 1.277139208173691e-05,
1655
+ "loss": 0.036,
1656
+ "step": 2010
1657
+ },
1658
+ {
1659
+ "epoch": 23.2183908045977,
1660
+ "grad_norm": 0.7994110584259033,
1661
+ "learning_rate": 1.2558535547041295e-05,
1662
+ "loss": 0.0313,
1663
+ "step": 2020
1664
+ },
1665
+ {
1666
+ "epoch": 23.333333333333332,
1667
+ "grad_norm": 0.33477237820625305,
1668
+ "learning_rate": 1.2345679012345678e-05,
1669
+ "loss": 0.034,
1670
+ "step": 2030
1671
+ },
1672
+ {
1673
+ "epoch": 23.448275862068964,
1674
+ "grad_norm": 0.17108239233493805,
1675
+ "learning_rate": 1.2132822477650065e-05,
1676
+ "loss": 0.0262,
1677
+ "step": 2040
1678
+ },
1679
+ {
1680
+ "epoch": 23.563218390804597,
1681
+ "grad_norm": 0.40771496295928955,
1682
+ "learning_rate": 1.1919965942954448e-05,
1683
+ "loss": 0.0375,
1684
+ "step": 2050
1685
+ },
1686
+ {
1687
+ "epoch": 23.67816091954023,
1688
+ "grad_norm": 1.7193657159805298,
1689
+ "learning_rate": 1.1707109408258835e-05,
1690
+ "loss": 0.0348,
1691
+ "step": 2060
1692
+ },
1693
+ {
1694
+ "epoch": 23.79310344827586,
1695
+ "grad_norm": 0.2730618715286255,
1696
+ "learning_rate": 1.1494252873563218e-05,
1697
+ "loss": 0.0335,
1698
+ "step": 2070
1699
+ },
1700
+ {
1701
+ "epoch": 23.908045977011493,
1702
+ "grad_norm": 0.12540611624717712,
1703
+ "learning_rate": 1.1281396338867605e-05,
1704
+ "loss": 0.0338,
1705
+ "step": 2080
1706
+ },
1707
+ {
1708
+ "epoch": 24.022988505747126,
1709
+ "grad_norm": 0.6330766081809998,
1710
+ "learning_rate": 1.1068539804171988e-05,
1711
+ "loss": 0.0409,
1712
+ "step": 2090
1713
+ },
1714
+ {
1715
+ "epoch": 24.137931034482758,
1716
+ "grad_norm": 0.5552668571472168,
1717
+ "learning_rate": 1.0855683269476373e-05,
1718
+ "loss": 0.0339,
1719
+ "step": 2100
1720
+ },
1721
+ {
1722
+ "epoch": 24.137931034482758,
1723
+ "eval_accuracy": 0.9434782608695652,
1724
+ "eval_f1": 0.8644892286309938,
1725
+ "eval_loss": 0.1680324375629425,
1726
+ "eval_precision": 0.8675034867503487,
1727
+ "eval_recall": 0.8614958448753463,
1728
+ "eval_runtime": 2.5297,
1729
+ "eval_samples_per_second": 136.381,
1730
+ "eval_steps_per_second": 17.394,
1731
+ "step": 2100
1732
+ },
1733
+ {
1734
+ "epoch": 24.25287356321839,
1735
+ "grad_norm": 0.8327674269676208,
1736
+ "learning_rate": 1.0642826734780758e-05,
1737
+ "loss": 0.0307,
1738
+ "step": 2110
1739
+ },
1740
+ {
1741
+ "epoch": 24.367816091954023,
1742
+ "grad_norm": 0.33239325881004333,
1743
+ "learning_rate": 1.0429970200085143e-05,
1744
+ "loss": 0.0333,
1745
+ "step": 2120
1746
+ },
1747
+ {
1748
+ "epoch": 24.482758620689655,
1749
+ "grad_norm": 0.44145530462265015,
1750
+ "learning_rate": 1.0217113665389528e-05,
1751
+ "loss": 0.037,
1752
+ "step": 2130
1753
+ },
1754
+ {
1755
+ "epoch": 24.597701149425287,
1756
+ "grad_norm": 0.7869518995285034,
1757
+ "learning_rate": 1.0004257130693913e-05,
1758
+ "loss": 0.0364,
1759
+ "step": 2140
1760
+ },
1761
+ {
1762
+ "epoch": 24.71264367816092,
1763
+ "grad_norm": 0.31890323758125305,
1764
+ "learning_rate": 9.791400595998297e-06,
1765
+ "loss": 0.0329,
1766
+ "step": 2150
1767
+ },
1768
+ {
1769
+ "epoch": 24.82758620689655,
1770
+ "grad_norm": 0.53341144323349,
1771
+ "learning_rate": 9.578544061302683e-06,
1772
+ "loss": 0.0481,
1773
+ "step": 2160
1774
+ },
1775
+ {
1776
+ "epoch": 24.942528735632184,
1777
+ "grad_norm": 0.4981224834918976,
1778
+ "learning_rate": 9.365687526607067e-06,
1779
+ "loss": 0.0368,
1780
+ "step": 2170
1781
+ },
1782
+ {
1783
+ "epoch": 25.057471264367816,
1784
+ "grad_norm": 1.0361753702163696,
1785
+ "learning_rate": 9.152830991911453e-06,
1786
+ "loss": 0.0464,
1787
+ "step": 2180
1788
+ },
1789
+ {
1790
+ "epoch": 25.17241379310345,
1791
+ "grad_norm": 0.1470736712217331,
1792
+ "learning_rate": 8.939974457215837e-06,
1793
+ "loss": 0.0372,
1794
+ "step": 2190
1795
+ },
1796
+ {
1797
+ "epoch": 25.28735632183908,
1798
+ "grad_norm": 0.5003727078437805,
1799
+ "learning_rate": 8.727117922520222e-06,
1800
+ "loss": 0.0222,
1801
+ "step": 2200
1802
+ },
1803
+ {
1804
+ "epoch": 25.28735632183908,
1805
+ "eval_accuracy": 0.9394202898550724,
1806
+ "eval_f1": 0.853743876836949,
1807
+ "eval_loss": 0.16558243334293365,
1808
+ "eval_precision": 0.8628005657708628,
1809
+ "eval_recall": 0.8448753462603878,
1810
+ "eval_runtime": 2.7223,
1811
+ "eval_samples_per_second": 126.732,
1812
+ "eval_steps_per_second": 16.163,
1813
+ "step": 2200
1814
+ },
1815
+ {
1816
+ "epoch": 25.402298850574713,
1817
+ "grad_norm": 1.032658576965332,
1818
+ "learning_rate": 8.514261387824607e-06,
1819
+ "loss": 0.036,
1820
+ "step": 2210
1821
+ },
1822
+ {
1823
+ "epoch": 25.517241379310345,
1824
+ "grad_norm": 0.3298107087612152,
1825
+ "learning_rate": 8.301404853128992e-06,
1826
+ "loss": 0.0311,
1827
+ "step": 2220
1828
+ },
1829
+ {
1830
+ "epoch": 25.632183908045977,
1831
+ "grad_norm": 1.8201491832733154,
1832
+ "learning_rate": 8.088548318433377e-06,
1833
+ "loss": 0.0373,
1834
+ "step": 2230
1835
+ },
1836
+ {
1837
+ "epoch": 25.74712643678161,
1838
+ "grad_norm": 0.5777905583381653,
1839
+ "learning_rate": 7.875691783737762e-06,
1840
+ "loss": 0.0323,
1841
+ "step": 2240
1842
+ },
1843
+ {
1844
+ "epoch": 25.862068965517242,
1845
+ "grad_norm": 0.3212037980556488,
1846
+ "learning_rate": 7.662835249042145e-06,
1847
+ "loss": 0.0337,
1848
+ "step": 2250
1849
+ },
1850
+ {
1851
+ "epoch": 25.977011494252874,
1852
+ "grad_norm": 1.0763506889343262,
1853
+ "learning_rate": 7.449978714346531e-06,
1854
+ "loss": 0.0455,
1855
+ "step": 2260
1856
+ },
1857
+ {
1858
+ "epoch": 26.091954022988507,
1859
+ "grad_norm": 0.7833639979362488,
1860
+ "learning_rate": 7.237122179650915e-06,
1861
+ "loss": 0.0289,
1862
+ "step": 2270
1863
+ },
1864
+ {
1865
+ "epoch": 26.20689655172414,
1866
+ "grad_norm": 1.0893276929855347,
1867
+ "learning_rate": 7.024265644955301e-06,
1868
+ "loss": 0.037,
1869
+ "step": 2280
1870
+ },
1871
+ {
1872
+ "epoch": 26.32183908045977,
1873
+ "grad_norm": 0.534814715385437,
1874
+ "learning_rate": 6.811409110259685e-06,
1875
+ "loss": 0.0309,
1876
+ "step": 2290
1877
+ },
1878
+ {
1879
+ "epoch": 26.436781609195403,
1880
+ "grad_norm": 1.0022532939910889,
1881
+ "learning_rate": 6.598552575564069e-06,
1882
+ "loss": 0.026,
1883
+ "step": 2300
1884
+ },
1885
+ {
1886
+ "epoch": 26.436781609195403,
1887
+ "eval_accuracy": 0.9385507246376812,
1888
+ "eval_f1": 0.8515406162464986,
1889
+ "eval_loss": 0.16867290437221527,
1890
+ "eval_precision": 0.8611898016997167,
1891
+ "eval_recall": 0.8421052631578947,
1892
+ "eval_runtime": 3.2071,
1893
+ "eval_samples_per_second": 107.575,
1894
+ "eval_steps_per_second": 13.72,
1895
+ "step": 2300
1896
+ },
1897
+ {
1898
+ "epoch": 26.551724137931036,
1899
+ "grad_norm": 0.396712988615036,
1900
+ "learning_rate": 6.385696040868455e-06,
1901
+ "loss": 0.0422,
1902
+ "step": 2310
1903
+ },
1904
+ {
1905
+ "epoch": 26.666666666666668,
1906
+ "grad_norm": 0.6816790103912354,
1907
+ "learning_rate": 6.172839506172839e-06,
1908
+ "loss": 0.0259,
1909
+ "step": 2320
1910
+ },
1911
+ {
1912
+ "epoch": 26.7816091954023,
1913
+ "grad_norm": 0.44069159030914307,
1914
+ "learning_rate": 5.959982971477224e-06,
1915
+ "loss": 0.0314,
1916
+ "step": 2330
1917
+ },
1918
+ {
1919
+ "epoch": 26.896551724137932,
1920
+ "grad_norm": 0.8928768634796143,
1921
+ "learning_rate": 5.747126436781609e-06,
1922
+ "loss": 0.0451,
1923
+ "step": 2340
1924
+ },
1925
+ {
1926
+ "epoch": 27.011494252873565,
1927
+ "grad_norm": 0.15559493005275726,
1928
+ "learning_rate": 5.534269902085994e-06,
1929
+ "loss": 0.0292,
1930
+ "step": 2350
1931
+ },
1932
+ {
1933
+ "epoch": 27.126436781609197,
1934
+ "grad_norm": 0.57392817735672,
1935
+ "learning_rate": 5.321413367390379e-06,
1936
+ "loss": 0.0339,
1937
+ "step": 2360
1938
+ },
1939
+ {
1940
+ "epoch": 27.24137931034483,
1941
+ "grad_norm": 0.4282006621360779,
1942
+ "learning_rate": 5.108556832694764e-06,
1943
+ "loss": 0.0272,
1944
+ "step": 2370
1945
+ },
1946
+ {
1947
+ "epoch": 27.35632183908046,
1948
+ "grad_norm": 0.3765332102775574,
1949
+ "learning_rate": 4.895700297999148e-06,
1950
+ "loss": 0.0224,
1951
+ "step": 2380
1952
+ },
1953
+ {
1954
+ "epoch": 27.47126436781609,
1955
+ "grad_norm": 0.26194748282432556,
1956
+ "learning_rate": 4.682843763303533e-06,
1957
+ "loss": 0.025,
1958
+ "step": 2390
1959
+ },
1960
+ {
1961
+ "epoch": 27.586206896551722,
1962
+ "grad_norm": 0.6783250570297241,
1963
+ "learning_rate": 4.469987228607918e-06,
1964
+ "loss": 0.0353,
1965
+ "step": 2400
1966
+ },
1967
+ {
1968
+ "epoch": 27.586206896551722,
1969
+ "eval_accuracy": 0.9402898550724638,
1970
+ "eval_f1": 0.8555399719495091,
1971
+ "eval_loss": 0.16655248403549194,
1972
+ "eval_precision": 0.8664772727272727,
1973
+ "eval_recall": 0.8448753462603878,
1974
+ "eval_runtime": 2.5159,
1975
+ "eval_samples_per_second": 137.126,
1976
+ "eval_steps_per_second": 17.489,
1977
+ "step": 2400
1978
+ },
1979
+ {
1980
+ "epoch": 27.701149425287355,
1981
+ "grad_norm": 0.7945173382759094,
1982
+ "learning_rate": 4.257130693912303e-06,
1983
+ "loss": 0.0346,
1984
+ "step": 2410
1985
+ },
1986
+ {
1987
+ "epoch": 27.816091954022987,
1988
+ "grad_norm": 0.34426242113113403,
1989
+ "learning_rate": 4.044274159216688e-06,
1990
+ "loss": 0.0354,
1991
+ "step": 2420
1992
+ },
1993
+ {
1994
+ "epoch": 27.93103448275862,
1995
+ "grad_norm": 0.2353845238685608,
1996
+ "learning_rate": 3.8314176245210725e-06,
1997
+ "loss": 0.0306,
1998
+ "step": 2430
1999
+ },
2000
+ {
2001
+ "epoch": 28.04597701149425,
2002
+ "grad_norm": 0.7292235493659973,
2003
+ "learning_rate": 3.6185610898254575e-06,
2004
+ "loss": 0.0268,
2005
+ "step": 2440
2006
+ },
2007
+ {
2008
+ "epoch": 28.160919540229884,
2009
+ "grad_norm": 0.65985107421875,
2010
+ "learning_rate": 3.4057045551298425e-06,
2011
+ "loss": 0.0235,
2012
+ "step": 2450
2013
+ },
2014
+ {
2015
+ "epoch": 28.275862068965516,
2016
+ "grad_norm": 0.36814385652542114,
2017
+ "learning_rate": 3.1928480204342275e-06,
2018
+ "loss": 0.0234,
2019
+ "step": 2460
2020
+ },
2021
+ {
2022
+ "epoch": 28.39080459770115,
2023
+ "grad_norm": 0.2845553159713745,
2024
+ "learning_rate": 2.979991485738612e-06,
2025
+ "loss": 0.0249,
2026
+ "step": 2470
2027
+ },
2028
+ {
2029
+ "epoch": 28.50574712643678,
2030
+ "grad_norm": 0.36483725905418396,
2031
+ "learning_rate": 2.767134951042997e-06,
2032
+ "loss": 0.0267,
2033
+ "step": 2480
2034
+ },
2035
+ {
2036
+ "epoch": 28.620689655172413,
2037
+ "grad_norm": 0.5152461528778076,
2038
+ "learning_rate": 2.554278416347382e-06,
2039
+ "loss": 0.0325,
2040
+ "step": 2490
2041
+ },
2042
+ {
2043
+ "epoch": 28.735632183908045,
2044
+ "grad_norm": 0.775412380695343,
2045
+ "learning_rate": 2.3414218816517667e-06,
2046
+ "loss": 0.0294,
2047
+ "step": 2500
2048
+ },
2049
+ {
2050
+ "epoch": 28.735632183908045,
2051
+ "eval_accuracy": 0.9428985507246377,
2052
+ "eval_f1": 0.8613652357494722,
2053
+ "eval_loss": 0.1659679114818573,
2054
+ "eval_precision": 0.8755364806866953,
2055
+ "eval_recall": 0.8476454293628809,
2056
+ "eval_runtime": 2.5486,
2057
+ "eval_samples_per_second": 135.371,
2058
+ "eval_steps_per_second": 17.265,
2059
+ "step": 2500
2060
+ },
2061
+ {
2062
+ "epoch": 28.850574712643677,
2063
+ "grad_norm": 0.25289997458457947,
2064
+ "learning_rate": 2.1285653469561517e-06,
2065
+ "loss": 0.0285,
2066
+ "step": 2510
2067
+ },
2068
+ {
2069
+ "epoch": 28.96551724137931,
2070
+ "grad_norm": 0.11768297851085663,
2071
+ "learning_rate": 1.9157088122605362e-06,
2072
+ "loss": 0.0233,
2073
+ "step": 2520
2074
+ },
2075
+ {
2076
+ "epoch": 29.080459770114942,
2077
+ "grad_norm": 0.4232161045074463,
2078
+ "learning_rate": 1.7028522775649212e-06,
2079
+ "loss": 0.021,
2080
+ "step": 2530
2081
+ },
2082
+ {
2083
+ "epoch": 29.195402298850574,
2084
+ "grad_norm": 0.6776716709136963,
2085
+ "learning_rate": 1.489995742869306e-06,
2086
+ "loss": 0.0282,
2087
+ "step": 2540
2088
+ },
2089
+ {
2090
+ "epoch": 29.310344827586206,
2091
+ "grad_norm": 0.2837054431438446,
2092
+ "learning_rate": 1.277139208173691e-06,
2093
+ "loss": 0.034,
2094
+ "step": 2550
2095
+ },
2096
+ {
2097
+ "epoch": 29.42528735632184,
2098
+ "grad_norm": 0.13374456763267517,
2099
+ "learning_rate": 1.0642826734780758e-06,
2100
+ "loss": 0.0261,
2101
+ "step": 2560
2102
+ },
2103
+ {
2104
+ "epoch": 29.54022988505747,
2105
+ "grad_norm": 0.32526296377182007,
2106
+ "learning_rate": 8.514261387824606e-07,
2107
+ "loss": 0.0318,
2108
+ "step": 2570
2109
+ },
2110
+ {
2111
+ "epoch": 29.655172413793103,
2112
+ "grad_norm": 0.7919621467590332,
2113
+ "learning_rate": 6.385696040868455e-07,
2114
+ "loss": 0.0305,
2115
+ "step": 2580
2116
+ },
2117
+ {
2118
+ "epoch": 29.770114942528735,
2119
+ "grad_norm": 0.1575266569852829,
2120
+ "learning_rate": 4.257130693912303e-07,
2121
+ "loss": 0.0298,
2122
+ "step": 2590
2123
+ },
2124
+ {
2125
+ "epoch": 29.885057471264368,
2126
+ "grad_norm": 0.35859954357147217,
2127
+ "learning_rate": 2.1285653469561516e-07,
2128
+ "loss": 0.0243,
2129
+ "step": 2600
2130
+ },
2131
+ {
2132
+ "epoch": 29.885057471264368,
2133
+ "eval_accuracy": 0.9423188405797102,
2134
+ "eval_f1": 0.8589652728561304,
2135
+ "eval_loss": 0.16644859313964844,
2136
+ "eval_precision": 0.8795355587808418,
2137
+ "eval_recall": 0.8393351800554016,
2138
+ "eval_runtime": 2.5418,
2139
+ "eval_samples_per_second": 135.729,
2140
+ "eval_steps_per_second": 17.31,
2141
+ "step": 2600
2142
+ },
2143
+ {
2144
+ "epoch": 30.0,
2145
+ "grad_norm": 0.5356839299201965,
2146
+ "learning_rate": 0.0,
2147
+ "loss": 0.0375,
2148
+ "step": 2610
2149
+ },
2150
+ {
2151
+ "epoch": 30.0,
2152
+ "step": 2610,
2153
+ "total_flos": 3.2060734740537754e+18,
2154
+ "train_loss": 0.09901393407606074,
2155
+ "train_runtime": 730.7896,
2156
+ "train_samples_per_second": 56.61,
2157
+ "train_steps_per_second": 3.571
2158
+ }
2159
+ ],
2160
+ "logging_steps": 10,
2161
+ "max_steps": 2610,
2162
+ "num_input_tokens_seen": 0,
2163
+ "num_train_epochs": 30,
2164
+ "save_steps": 500,
2165
+ "stateful_callbacks": {
2166
+ "TrainerControl": {
2167
+ "args": {
2168
+ "should_epoch_stop": false,
2169
+ "should_evaluate": false,
2170
+ "should_log": false,
2171
+ "should_save": true,
2172
+ "should_training_stop": true
2173
+ },
2174
+ "attributes": {}
2175
+ }
2176
+ },
2177
+ "total_flos": 3.2060734740537754e+18,
2178
+ "train_batch_size": 16,
2179
+ "trial_name": null,
2180
+ "trial_params": null
2181
+ }