2023-10-12 14:42:58,769 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:42:58,772 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 14:42:58,772 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:42:58,772 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-12 14:42:58,772 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:42:58,772 Train: 20847 sentences 2023-10-12 14:42:58,773 (train_with_dev=False, train_with_test=False) 2023-10-12 14:42:58,773 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:42:58,773 Training Params: 2023-10-12 14:42:58,773 - learning_rate: "0.00016" 2023-10-12 14:42:58,773 - mini_batch_size: "8" 2023-10-12 14:42:58,773 - max_epochs: "10" 2023-10-12 14:42:58,773 - shuffle: "True" 2023-10-12 14:42:58,773 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:42:58,773 Plugins: 2023-10-12 14:42:58,773 - TensorboardLogger 2023-10-12 14:42:58,773 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 14:42:58,773 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:42:58,774 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 14:42:58,774 - metric: "('micro avg', 'f1-score')" 2023-10-12 14:42:58,774 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:42:58,774 Computation: 2023-10-12 14:42:58,774 - compute on device: cuda:0 2023-10-12 14:42:58,774 - embedding storage: none 2023-10-12 14:42:58,774 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:42:58,774 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-5" 2023-10-12 14:42:58,774 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:42:58,774 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:42:58,774 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 14:45:22,269 epoch 1 - iter 260/2606 - loss 2.78846218 - time (sec): 143.49 - samples/sec: 283.51 - lr: 0.000016 - momentum: 0.000000 2023-10-12 14:47:42,477 epoch 1 - iter 520/2606 - loss 2.51753083 - time (sec): 283.70 - samples/sec: 279.30 - lr: 0.000032 - momentum: 0.000000 2023-10-12 14:50:00,212 epoch 1 - iter 780/2606 - loss 2.13517667 - time (sec): 421.44 - samples/sec: 272.92 - lr: 0.000048 - momentum: 0.000000 2023-10-12 14:52:18,950 epoch 1 - iter 1040/2606 - loss 1.75769451 - time (sec): 560.17 - samples/sec: 271.20 - lr: 0.000064 - momentum: 0.000000 2023-10-12 14:54:38,379 epoch 1 - iter 1300/2606 - loss 1.50675156 - time (sec): 699.60 - samples/sec: 270.34 - lr: 0.000080 - momentum: 0.000000 2023-10-12 14:56:57,376 epoch 1 - iter 1560/2606 - loss 1.32804662 - time (sec): 838.60 - samples/sec: 271.05 - lr: 0.000096 - momentum: 0.000000 2023-10-12 14:59:15,361 epoch 1 - iter 1820/2606 - loss 1.19020418 - time (sec): 976.58 - samples/sec: 269.44 - lr: 0.000112 - momentum: 0.000000 2023-10-12 15:01:33,765 epoch 1 - iter 2080/2606 - loss 1.08183070 - time (sec): 1114.99 - samples/sec: 268.49 - lr: 0.000128 - momentum: 0.000000 2023-10-12 15:03:49,752 epoch 1 - iter 2340/2606 - loss 0.99896372 - time (sec): 1250.98 - samples/sec: 266.43 - lr: 0.000144 - momentum: 0.000000 2023-10-12 15:06:04,450 epoch 1 - iter 2600/2606 - loss 0.92938956 - time (sec): 1385.67 - samples/sec: 264.52 - lr: 0.000160 - momentum: 0.000000 2023-10-12 15:06:07,593 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:06:07,594 EPOCH 1 done: loss 0.9276 - lr: 0.000160 2023-10-12 15:06:45,002 DEV : loss 0.1160765215754509 - f1-score (micro avg) 0.2953 2023-10-12 15:06:45,055 saving best model 2023-10-12 15:06:45,960 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:09:02,141 epoch 2 - iter 260/2606 - loss 0.21949180 - time (sec): 136.18 - samples/sec: 262.54 - lr: 0.000158 - momentum: 0.000000 2023-10-12 15:11:19,756 epoch 2 - iter 520/2606 - loss 0.19335053 - time (sec): 273.79 - samples/sec: 262.46 - lr: 0.000156 - momentum: 0.000000 2023-10-12 15:13:40,437 epoch 2 - iter 780/2606 - loss 0.18037626 - time (sec): 414.47 - samples/sec: 257.19 - lr: 0.000155 - momentum: 0.000000 2023-10-12 15:15:59,935 epoch 2 - iter 1040/2606 - loss 0.17218573 - time (sec): 553.97 - samples/sec: 258.66 - lr: 0.000153 - momentum: 0.000000 2023-10-12 15:18:18,550 epoch 2 - iter 1300/2606 - loss 0.16693108 - time (sec): 692.59 - samples/sec: 259.06 - lr: 0.000151 - momentum: 0.000000 2023-10-12 15:20:41,440 epoch 2 - iter 1560/2606 - loss 0.15950813 - time (sec): 835.48 - samples/sec: 260.76 - lr: 0.000149 - momentum: 0.000000 2023-10-12 15:22:58,341 epoch 2 - iter 1820/2606 - loss 0.15628497 - time (sec): 972.38 - samples/sec: 260.31 - lr: 0.000148 - momentum: 0.000000 2023-10-12 15:25:21,623 epoch 2 - iter 2080/2606 - loss 0.15437870 - time (sec): 1115.66 - samples/sec: 261.15 - lr: 0.000146 - momentum: 0.000000 2023-10-12 15:27:43,455 epoch 2 - iter 2340/2606 - loss 0.15140915 - time (sec): 1257.49 - samples/sec: 261.85 - lr: 0.000144 - momentum: 0.000000 2023-10-12 15:30:05,871 epoch 2 - iter 2600/2606 - loss 0.14820465 - time (sec): 1399.91 - samples/sec: 261.86 - lr: 0.000142 - momentum: 0.000000 2023-10-12 15:30:09,099 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:30:09,099 EPOCH 2 done: loss 0.1481 - lr: 0.000142 2023-10-12 15:30:50,121 DEV : loss 0.1322641223669052 - f1-score (micro avg) 0.3856 2023-10-12 15:30:50,177 saving best model 2023-10-12 15:30:53,543 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:33:11,249 epoch 3 - iter 260/2606 - loss 0.09563071 - time (sec): 137.70 - samples/sec: 257.87 - lr: 0.000140 - momentum: 0.000000 2023-10-12 15:35:29,555 epoch 3 - iter 520/2606 - loss 0.09938671 - time (sec): 276.01 - samples/sec: 249.52 - lr: 0.000139 - momentum: 0.000000 2023-10-12 15:37:48,598 epoch 3 - iter 780/2606 - loss 0.09544107 - time (sec): 415.05 - samples/sec: 257.01 - lr: 0.000137 - momentum: 0.000000 2023-10-12 15:40:07,369 epoch 3 - iter 1040/2606 - loss 0.09255662 - time (sec): 553.82 - samples/sec: 257.47 - lr: 0.000135 - momentum: 0.000000 2023-10-12 15:42:26,484 epoch 3 - iter 1300/2606 - loss 0.09067725 - time (sec): 692.94 - samples/sec: 258.50 - lr: 0.000133 - momentum: 0.000000 2023-10-12 15:44:47,036 epoch 3 - iter 1560/2606 - loss 0.09053795 - time (sec): 833.49 - samples/sec: 259.53 - lr: 0.000132 - momentum: 0.000000 2023-10-12 15:47:05,955 epoch 3 - iter 1820/2606 - loss 0.09161612 - time (sec): 972.41 - samples/sec: 258.63 - lr: 0.000130 - momentum: 0.000000 2023-10-12 15:49:27,590 epoch 3 - iter 2080/2606 - loss 0.09151431 - time (sec): 1114.04 - samples/sec: 259.61 - lr: 0.000128 - momentum: 0.000000 2023-10-12 15:51:50,908 epoch 3 - iter 2340/2606 - loss 0.09058061 - time (sec): 1257.36 - samples/sec: 261.72 - lr: 0.000126 - momentum: 0.000000 2023-10-12 15:54:13,018 epoch 3 - iter 2600/2606 - loss 0.08975215 - time (sec): 1399.47 - samples/sec: 261.96 - lr: 0.000125 - momentum: 0.000000 2023-10-12 15:54:16,291 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:54:16,292 EPOCH 3 done: loss 0.0903 - lr: 0.000125 2023-10-12 15:55:00,098 DEV : loss 0.18417686223983765 - f1-score (micro avg) 0.3766 2023-10-12 15:55:00,158 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:57:17,883 epoch 4 - iter 260/2606 - loss 0.07143967 - time (sec): 137.72 - samples/sec: 259.14 - lr: 0.000123 - momentum: 0.000000 2023-10-12 15:59:39,503 epoch 4 - iter 520/2606 - loss 0.06267308 - time (sec): 279.34 - samples/sec: 261.39 - lr: 0.000121 - momentum: 0.000000 2023-10-12 16:01:59,376 epoch 4 - iter 780/2606 - loss 0.06153226 - time (sec): 419.22 - samples/sec: 260.72 - lr: 0.000119 - momentum: 0.000000 2023-10-12 16:04:19,565 epoch 4 - iter 1040/2606 - loss 0.06253834 - time (sec): 559.41 - samples/sec: 257.52 - lr: 0.000117 - momentum: 0.000000 2023-10-12 16:06:41,078 epoch 4 - iter 1300/2606 - loss 0.06366907 - time (sec): 700.92 - samples/sec: 260.23 - lr: 0.000116 - momentum: 0.000000 2023-10-12 16:09:04,620 epoch 4 - iter 1560/2606 - loss 0.06340498 - time (sec): 844.46 - samples/sec: 261.62 - lr: 0.000114 - momentum: 0.000000 2023-10-12 16:11:25,011 epoch 4 - iter 1820/2606 - loss 0.06174076 - time (sec): 984.85 - samples/sec: 261.65 - lr: 0.000112 - momentum: 0.000000 2023-10-12 16:13:46,732 epoch 4 - iter 2080/2606 - loss 0.06170781 - time (sec): 1126.57 - samples/sec: 261.08 - lr: 0.000110 - momentum: 0.000000 2023-10-12 16:16:10,287 epoch 4 - iter 2340/2606 - loss 0.06134273 - time (sec): 1270.13 - samples/sec: 259.90 - lr: 0.000109 - momentum: 0.000000 2023-10-12 16:18:36,531 epoch 4 - iter 2600/2606 - loss 0.06263550 - time (sec): 1416.37 - samples/sec: 259.02 - lr: 0.000107 - momentum: 0.000000 2023-10-12 16:18:39,580 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:18:39,580 EPOCH 4 done: loss 0.0627 - lr: 0.000107 2023-10-12 16:19:22,241 DEV : loss 0.21061837673187256 - f1-score (micro avg) 0.4054 2023-10-12 16:19:22,296 saving best model 2023-10-12 16:19:24,907 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:21:46,999 epoch 5 - iter 260/2606 - loss 0.03945537 - time (sec): 142.09 - samples/sec: 253.56 - lr: 0.000105 - momentum: 0.000000 2023-10-12 16:24:03,360 epoch 5 - iter 520/2606 - loss 0.04636967 - time (sec): 278.45 - samples/sec: 251.85 - lr: 0.000103 - momentum: 0.000000 2023-10-12 16:26:21,805 epoch 5 - iter 780/2606 - loss 0.04592605 - time (sec): 416.89 - samples/sec: 257.96 - lr: 0.000101 - momentum: 0.000000 2023-10-12 16:28:42,348 epoch 5 - iter 1040/2606 - loss 0.04810081 - time (sec): 557.44 - samples/sec: 257.72 - lr: 0.000100 - momentum: 0.000000 2023-10-12 16:31:03,022 epoch 5 - iter 1300/2606 - loss 0.04653376 - time (sec): 698.11 - samples/sec: 259.15 - lr: 0.000098 - momentum: 0.000000 2023-10-12 16:33:27,839 epoch 5 - iter 1560/2606 - loss 0.04566380 - time (sec): 842.93 - samples/sec: 261.65 - lr: 0.000096 - momentum: 0.000000 2023-10-12 16:35:48,815 epoch 5 - iter 1820/2606 - loss 0.04603096 - time (sec): 983.90 - samples/sec: 263.48 - lr: 0.000094 - momentum: 0.000000 2023-10-12 16:38:06,553 epoch 5 - iter 2080/2606 - loss 0.04660232 - time (sec): 1121.64 - samples/sec: 260.70 - lr: 0.000093 - momentum: 0.000000 2023-10-12 16:40:26,141 epoch 5 - iter 2340/2606 - loss 0.04629296 - time (sec): 1261.23 - samples/sec: 261.67 - lr: 0.000091 - momentum: 0.000000 2023-10-12 16:42:45,513 epoch 5 - iter 2600/2606 - loss 0.04642196 - time (sec): 1400.60 - samples/sec: 261.78 - lr: 0.000089 - momentum: 0.000000 2023-10-12 16:42:48,596 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:42:48,597 EPOCH 5 done: loss 0.0464 - lr: 0.000089 2023-10-12 16:43:30,191 DEV : loss 0.2700042128562927 - f1-score (micro avg) 0.3892 2023-10-12 16:43:30,253 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:45:46,941 epoch 6 - iter 260/2606 - loss 0.02917252 - time (sec): 136.68 - samples/sec: 258.14 - lr: 0.000087 - momentum: 0.000000 2023-10-12 16:48:07,688 epoch 6 - iter 520/2606 - loss 0.02982492 - time (sec): 277.43 - samples/sec: 268.19 - lr: 0.000085 - momentum: 0.000000 2023-10-12 16:50:30,237 epoch 6 - iter 780/2606 - loss 0.02907750 - time (sec): 419.98 - samples/sec: 266.04 - lr: 0.000084 - momentum: 0.000000 2023-10-12 16:52:48,499 epoch 6 - iter 1040/2606 - loss 0.03047511 - time (sec): 558.24 - samples/sec: 263.35 - lr: 0.000082 - momentum: 0.000000 2023-10-12 16:55:07,109 epoch 6 - iter 1300/2606 - loss 0.03108829 - time (sec): 696.85 - samples/sec: 260.50 - lr: 0.000080 - momentum: 0.000000 2023-10-12 16:57:30,710 epoch 6 - iter 1560/2606 - loss 0.03166117 - time (sec): 840.45 - samples/sec: 262.15 - lr: 0.000078 - momentum: 0.000000 2023-10-12 16:59:49,912 epoch 6 - iter 1820/2606 - loss 0.03216064 - time (sec): 979.66 - samples/sec: 263.90 - lr: 0.000077 - momentum: 0.000000 2023-10-12 17:02:06,641 epoch 6 - iter 2080/2606 - loss 0.03189570 - time (sec): 1116.39 - samples/sec: 261.84 - lr: 0.000075 - momentum: 0.000000 2023-10-12 17:04:26,581 epoch 6 - iter 2340/2606 - loss 0.03148769 - time (sec): 1256.33 - samples/sec: 260.99 - lr: 0.000073 - momentum: 0.000000 2023-10-12 17:06:48,181 epoch 6 - iter 2600/2606 - loss 0.03084235 - time (sec): 1397.92 - samples/sec: 261.98 - lr: 0.000071 - momentum: 0.000000 2023-10-12 17:06:51,677 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:06:51,677 EPOCH 6 done: loss 0.0308 - lr: 0.000071 2023-10-12 17:07:34,451 DEV : loss 0.331845760345459 - f1-score (micro avg) 0.3893 2023-10-12 17:07:34,521 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:09:55,579 epoch 7 - iter 260/2606 - loss 0.02009250 - time (sec): 141.06 - samples/sec: 261.36 - lr: 0.000069 - momentum: 0.000000 2023-10-12 17:12:14,769 epoch 7 - iter 520/2606 - loss 0.02163387 - time (sec): 280.25 - samples/sec: 259.64 - lr: 0.000068 - momentum: 0.000000 2023-10-12 17:14:36,226 epoch 7 - iter 780/2606 - loss 0.02156226 - time (sec): 421.70 - samples/sec: 259.65 - lr: 0.000066 - momentum: 0.000000 2023-10-12 17:16:57,696 epoch 7 - iter 1040/2606 - loss 0.02085508 - time (sec): 563.17 - samples/sec: 260.09 - lr: 0.000064 - momentum: 0.000000 2023-10-12 17:19:18,764 epoch 7 - iter 1300/2606 - loss 0.02243678 - time (sec): 704.24 - samples/sec: 260.97 - lr: 0.000062 - momentum: 0.000000 2023-10-12 17:21:46,181 epoch 7 - iter 1560/2606 - loss 0.02248712 - time (sec): 851.66 - samples/sec: 264.81 - lr: 0.000061 - momentum: 0.000000 2023-10-12 17:24:06,465 epoch 7 - iter 1820/2606 - loss 0.02285975 - time (sec): 991.94 - samples/sec: 263.65 - lr: 0.000059 - momentum: 0.000000 2023-10-12 17:26:25,095 epoch 7 - iter 2080/2606 - loss 0.02331914 - time (sec): 1130.57 - samples/sec: 260.89 - lr: 0.000057 - momentum: 0.000000 2023-10-12 17:28:42,973 epoch 7 - iter 2340/2606 - loss 0.02372357 - time (sec): 1268.45 - samples/sec: 260.01 - lr: 0.000055 - momentum: 0.000000 2023-10-12 17:31:01,636 epoch 7 - iter 2600/2606 - loss 0.02299100 - time (sec): 1407.11 - samples/sec: 260.54 - lr: 0.000053 - momentum: 0.000000 2023-10-12 17:31:04,917 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:31:04,917 EPOCH 7 done: loss 0.0230 - lr: 0.000053 2023-10-12 17:31:48,254 DEV : loss 0.4155420660972595 - f1-score (micro avg) 0.3962 2023-10-12 17:31:48,314 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:34:09,309 epoch 8 - iter 260/2606 - loss 0.01572445 - time (sec): 140.99 - samples/sec: 266.46 - lr: 0.000052 - momentum: 0.000000 2023-10-12 17:36:26,868 epoch 8 - iter 520/2606 - loss 0.01655993 - time (sec): 278.55 - samples/sec: 262.58 - lr: 0.000050 - momentum: 0.000000 2023-10-12 17:38:49,623 epoch 8 - iter 780/2606 - loss 0.01595599 - time (sec): 421.31 - samples/sec: 262.08 - lr: 0.000048 - momentum: 0.000000 2023-10-12 17:41:08,530 epoch 8 - iter 1040/2606 - loss 0.01570820 - time (sec): 560.21 - samples/sec: 264.31 - lr: 0.000046 - momentum: 0.000000 2023-10-12 17:43:23,750 epoch 8 - iter 1300/2606 - loss 0.01696855 - time (sec): 695.43 - samples/sec: 261.84 - lr: 0.000045 - momentum: 0.000000 2023-10-12 17:45:44,557 epoch 8 - iter 1560/2606 - loss 0.01640359 - time (sec): 836.24 - samples/sec: 261.91 - lr: 0.000043 - momentum: 0.000000 2023-10-12 17:48:03,450 epoch 8 - iter 1820/2606 - loss 0.01664832 - time (sec): 975.13 - samples/sec: 259.70 - lr: 0.000041 - momentum: 0.000000 2023-10-12 17:50:21,879 epoch 8 - iter 2080/2606 - loss 0.01631891 - time (sec): 1113.56 - samples/sec: 259.71 - lr: 0.000039 - momentum: 0.000000 2023-10-12 17:52:44,465 epoch 8 - iter 2340/2606 - loss 0.01607584 - time (sec): 1256.15 - samples/sec: 262.11 - lr: 0.000037 - momentum: 0.000000 2023-10-12 17:55:03,738 epoch 8 - iter 2600/2606 - loss 0.01565804 - time (sec): 1395.42 - samples/sec: 262.49 - lr: 0.000036 - momentum: 0.000000 2023-10-12 17:55:07,315 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:55:07,315 EPOCH 8 done: loss 0.0157 - lr: 0.000036 2023-10-12 17:55:49,401 DEV : loss 0.4473365843296051 - f1-score (micro avg) 0.3998 2023-10-12 17:55:49,470 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:58:09,207 epoch 9 - iter 260/2606 - loss 0.01004739 - time (sec): 139.73 - samples/sec: 267.28 - lr: 0.000034 - momentum: 0.000000 2023-10-12 18:00:24,373 epoch 9 - iter 520/2606 - loss 0.01400692 - time (sec): 274.90 - samples/sec: 255.29 - lr: 0.000032 - momentum: 0.000000 2023-10-12 18:02:43,772 epoch 9 - iter 780/2606 - loss 0.01354522 - time (sec): 414.30 - samples/sec: 259.03 - lr: 0.000030 - momentum: 0.000000 2023-10-12 18:05:03,015 epoch 9 - iter 1040/2606 - loss 0.01362233 - time (sec): 553.54 - samples/sec: 260.95 - lr: 0.000029 - momentum: 0.000000 2023-10-12 18:07:24,332 epoch 9 - iter 1300/2606 - loss 0.01378154 - time (sec): 694.86 - samples/sec: 263.54 - lr: 0.000027 - momentum: 0.000000 2023-10-12 18:09:42,994 epoch 9 - iter 1560/2606 - loss 0.01339316 - time (sec): 833.52 - samples/sec: 262.40 - lr: 0.000025 - momentum: 0.000000 2023-10-12 18:12:04,241 epoch 9 - iter 1820/2606 - loss 0.01270746 - time (sec): 974.77 - samples/sec: 263.55 - lr: 0.000023 - momentum: 0.000000 2023-10-12 18:14:22,581 epoch 9 - iter 2080/2606 - loss 0.01221012 - time (sec): 1113.11 - samples/sec: 263.80 - lr: 0.000021 - momentum: 0.000000 2023-10-12 18:16:44,673 epoch 9 - iter 2340/2606 - loss 0.01191616 - time (sec): 1255.20 - samples/sec: 262.96 - lr: 0.000020 - momentum: 0.000000 2023-10-12 18:19:06,987 epoch 9 - iter 2600/2606 - loss 0.01189402 - time (sec): 1397.51 - samples/sec: 262.47 - lr: 0.000018 - momentum: 0.000000 2023-10-12 18:19:09,880 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:19:09,880 EPOCH 9 done: loss 0.0119 - lr: 0.000018 2023-10-12 18:19:51,874 DEV : loss 0.46722307801246643 - f1-score (micro avg) 0.3953 2023-10-12 18:19:51,939 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:22:15,633 epoch 10 - iter 260/2606 - loss 0.00881587 - time (sec): 143.69 - samples/sec: 254.48 - lr: 0.000016 - momentum: 0.000000 2023-10-12 18:24:40,318 epoch 10 - iter 520/2606 - loss 0.00878672 - time (sec): 288.38 - samples/sec: 250.25 - lr: 0.000014 - momentum: 0.000000 2023-10-12 18:27:05,316 epoch 10 - iter 780/2606 - loss 0.00827028 - time (sec): 433.37 - samples/sec: 253.33 - lr: 0.000013 - momentum: 0.000000 2023-10-12 18:29:24,430 epoch 10 - iter 1040/2606 - loss 0.00799660 - time (sec): 572.49 - samples/sec: 253.17 - lr: 0.000011 - momentum: 0.000000 2023-10-12 18:31:48,541 epoch 10 - iter 1300/2606 - loss 0.00736416 - time (sec): 716.60 - samples/sec: 253.76 - lr: 0.000009 - momentum: 0.000000 2023-10-12 18:34:02,226 epoch 10 - iter 1560/2606 - loss 0.00761932 - time (sec): 850.28 - samples/sec: 257.18 - lr: 0.000007 - momentum: 0.000000 2023-10-12 18:36:19,240 epoch 10 - iter 1820/2606 - loss 0.00775823 - time (sec): 987.30 - samples/sec: 256.49 - lr: 0.000005 - momentum: 0.000000 2023-10-12 18:38:40,352 epoch 10 - iter 2080/2606 - loss 0.00824190 - time (sec): 1128.41 - samples/sec: 257.76 - lr: 0.000004 - momentum: 0.000000 2023-10-12 18:40:58,874 epoch 10 - iter 2340/2606 - loss 0.00847650 - time (sec): 1266.93 - samples/sec: 258.68 - lr: 0.000002 - momentum: 0.000000 2023-10-12 18:43:19,012 epoch 10 - iter 2600/2606 - loss 0.00832149 - time (sec): 1407.07 - samples/sec: 260.50 - lr: 0.000000 - momentum: 0.000000 2023-10-12 18:43:22,196 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:43:22,196 EPOCH 10 done: loss 0.0083 - lr: 0.000000 2023-10-12 18:44:04,295 DEV : loss 0.44663310050964355 - f1-score (micro avg) 0.4037 2023-10-12 18:44:05,353 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:44:05,355 Loading model from best epoch ... 2023-10-12 18:44:09,451 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-12 18:45:50,694 Results: - F-score (micro) 0.4458 - F-score (macro) 0.3081 - Accuracy 0.2912 By class: precision recall f1-score support LOC 0.5050 0.5008 0.5029 1214 PER 0.4167 0.4455 0.4306 808 ORG 0.2948 0.3031 0.2989 353 HumanProd 0.0000 0.0000 0.0000 15 micro avg 0.4418 0.4498 0.4458 2390 macro avg 0.3041 0.3124 0.3081 2390 weighted avg 0.4409 0.4498 0.4452 2390 2023-10-12 18:45:50,695 ----------------------------------------------------------------------------------------------------