2023-10-12 11:36:25,567 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:36:25,570 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 11:36:25,570 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:36:25,570 MultiCorpus: 5777 train + 722 dev + 723 test sentences - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl 2023-10-12 11:36:25,570 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:36:25,570 Train: 5777 sentences 2023-10-12 11:36:25,570 (train_with_dev=False, train_with_test=False) 2023-10-12 11:36:25,570 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:36:25,570 Training Params: 2023-10-12 11:36:25,570 - learning_rate: "0.00016" 2023-10-12 11:36:25,570 - mini_batch_size: "4" 2023-10-12 11:36:25,570 - max_epochs: "10" 2023-10-12 11:36:25,571 - shuffle: "True" 2023-10-12 11:36:25,571 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:36:25,571 Plugins: 2023-10-12 11:36:25,571 - TensorboardLogger 2023-10-12 11:36:25,571 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 11:36:25,571 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:36:25,571 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 11:36:25,571 - metric: "('micro avg', 'f1-score')" 2023-10-12 11:36:25,571 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:36:25,571 Computation: 2023-10-12 11:36:25,571 - compute on device: cuda:0 2023-10-12 11:36:25,571 - embedding storage: none 2023-10-12 11:36:25,571 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:36:25,571 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" 2023-10-12 11:36:25,572 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:36:25,572 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:36:25,572 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 11:37:06,457 epoch 1 - iter 144/1445 - loss 2.53387856 - time (sec): 40.88 - samples/sec: 422.08 - lr: 0.000016 - momentum: 0.000000 2023-10-12 11:37:47,224 epoch 1 - iter 288/1445 - loss 2.39666268 - time (sec): 81.65 - samples/sec: 421.98 - lr: 0.000032 - momentum: 0.000000 2023-10-12 11:38:28,343 epoch 1 - iter 432/1445 - loss 2.12510402 - time (sec): 122.77 - samples/sec: 426.80 - lr: 0.000048 - momentum: 0.000000 2023-10-12 11:39:09,216 epoch 1 - iter 576/1445 - loss 1.84102776 - time (sec): 163.64 - samples/sec: 423.25 - lr: 0.000064 - momentum: 0.000000 2023-10-12 11:39:50,408 epoch 1 - iter 720/1445 - loss 1.55600844 - time (sec): 204.83 - samples/sec: 426.16 - lr: 0.000080 - momentum: 0.000000 2023-10-12 11:40:30,845 epoch 1 - iter 864/1445 - loss 1.34872713 - time (sec): 245.27 - samples/sec: 425.51 - lr: 0.000096 - momentum: 0.000000 2023-10-12 11:41:12,774 epoch 1 - iter 1008/1445 - loss 1.17239883 - time (sec): 287.20 - samples/sec: 430.53 - lr: 0.000112 - momentum: 0.000000 2023-10-12 11:41:52,301 epoch 1 - iter 1152/1445 - loss 1.05717442 - time (sec): 326.73 - samples/sec: 427.83 - lr: 0.000127 - momentum: 0.000000 2023-10-12 11:42:33,446 epoch 1 - iter 1296/1445 - loss 0.95578662 - time (sec): 367.87 - samples/sec: 427.97 - lr: 0.000143 - momentum: 0.000000 2023-10-12 11:43:15,363 epoch 1 - iter 1440/1445 - loss 0.87304989 - time (sec): 409.79 - samples/sec: 428.41 - lr: 0.000159 - momentum: 0.000000 2023-10-12 11:43:16,688 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:43:16,689 EPOCH 1 done: loss 0.8703 - lr: 0.000159 2023-10-12 11:43:37,138 DEV : loss 0.1632358282804489 - f1-score (micro avg) 0.3709 2023-10-12 11:43:37,169 saving best model 2023-10-12 11:43:38,041 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:44:18,915 epoch 2 - iter 144/1445 - loss 0.13142514 - time (sec): 40.87 - samples/sec: 433.96 - lr: 0.000158 - momentum: 0.000000 2023-10-12 11:45:00,727 epoch 2 - iter 288/1445 - loss 0.12657340 - time (sec): 82.68 - samples/sec: 433.65 - lr: 0.000156 - momentum: 0.000000 2023-10-12 11:45:42,255 epoch 2 - iter 432/1445 - loss 0.11975746 - time (sec): 124.21 - samples/sec: 433.07 - lr: 0.000155 - momentum: 0.000000 2023-10-12 11:46:23,911 epoch 2 - iter 576/1445 - loss 0.11421246 - time (sec): 165.87 - samples/sec: 440.54 - lr: 0.000153 - momentum: 0.000000 2023-10-12 11:47:03,632 epoch 2 - iter 720/1445 - loss 0.10995587 - time (sec): 205.59 - samples/sec: 435.13 - lr: 0.000151 - momentum: 0.000000 2023-10-12 11:47:44,068 epoch 2 - iter 864/1445 - loss 0.11054322 - time (sec): 246.03 - samples/sec: 431.13 - lr: 0.000149 - momentum: 0.000000 2023-10-12 11:48:24,346 epoch 2 - iter 1008/1445 - loss 0.11015148 - time (sec): 286.30 - samples/sec: 431.29 - lr: 0.000148 - momentum: 0.000000 2023-10-12 11:49:04,206 epoch 2 - iter 1152/1445 - loss 0.10764299 - time (sec): 326.16 - samples/sec: 430.07 - lr: 0.000146 - momentum: 0.000000 2023-10-12 11:49:45,427 epoch 2 - iter 1296/1445 - loss 0.10737065 - time (sec): 367.38 - samples/sec: 430.47 - lr: 0.000144 - momentum: 0.000000 2023-10-12 11:50:25,973 epoch 2 - iter 1440/1445 - loss 0.10547486 - time (sec): 407.93 - samples/sec: 430.43 - lr: 0.000142 - momentum: 0.000000 2023-10-12 11:50:27,285 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:50:27,286 EPOCH 2 done: loss 0.1054 - lr: 0.000142 2023-10-12 11:50:48,663 DEV : loss 0.09625712037086487 - f1-score (micro avg) 0.7642 2023-10-12 11:50:48,693 saving best model 2023-10-12 11:50:51,270 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:51:34,450 epoch 3 - iter 144/1445 - loss 0.07645202 - time (sec): 43.18 - samples/sec: 405.87 - lr: 0.000140 - momentum: 0.000000 2023-10-12 11:52:17,408 epoch 3 - iter 288/1445 - loss 0.07122375 - time (sec): 86.14 - samples/sec: 399.30 - lr: 0.000139 - momentum: 0.000000 2023-10-12 11:52:59,730 epoch 3 - iter 432/1445 - loss 0.06285638 - time (sec): 128.46 - samples/sec: 406.02 - lr: 0.000137 - momentum: 0.000000 2023-10-12 11:53:40,841 epoch 3 - iter 576/1445 - loss 0.06398642 - time (sec): 169.57 - samples/sec: 401.73 - lr: 0.000135 - momentum: 0.000000 2023-10-12 11:54:25,655 epoch 3 - iter 720/1445 - loss 0.06301735 - time (sec): 214.38 - samples/sec: 403.99 - lr: 0.000133 - momentum: 0.000000 2023-10-12 11:55:08,419 epoch 3 - iter 864/1445 - loss 0.06349136 - time (sec): 257.15 - samples/sec: 404.64 - lr: 0.000132 - momentum: 0.000000 2023-10-12 11:55:50,713 epoch 3 - iter 1008/1445 - loss 0.06544889 - time (sec): 299.44 - samples/sec: 408.67 - lr: 0.000130 - momentum: 0.000000 2023-10-12 11:56:32,021 epoch 3 - iter 1152/1445 - loss 0.06426159 - time (sec): 340.75 - samples/sec: 408.76 - lr: 0.000128 - momentum: 0.000000 2023-10-12 11:57:13,869 epoch 3 - iter 1296/1445 - loss 0.06289367 - time (sec): 382.60 - samples/sec: 410.41 - lr: 0.000126 - momentum: 0.000000 2023-10-12 11:57:56,511 epoch 3 - iter 1440/1445 - loss 0.06210849 - time (sec): 425.24 - samples/sec: 413.23 - lr: 0.000125 - momentum: 0.000000 2023-10-12 11:57:57,735 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:57:57,735 EPOCH 3 done: loss 0.0623 - lr: 0.000125 2023-10-12 11:58:19,234 DEV : loss 0.07376913726329803 - f1-score (micro avg) 0.87 2023-10-12 11:58:19,265 saving best model 2023-10-12 11:58:21,818 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:59:03,980 epoch 4 - iter 144/1445 - loss 0.03611712 - time (sec): 42.16 - samples/sec: 431.21 - lr: 0.000123 - momentum: 0.000000 2023-10-12 11:59:45,606 epoch 4 - iter 288/1445 - loss 0.03313447 - time (sec): 83.78 - samples/sec: 423.88 - lr: 0.000121 - momentum: 0.000000 2023-10-12 12:00:27,537 epoch 4 - iter 432/1445 - loss 0.04019091 - time (sec): 125.72 - samples/sec: 423.04 - lr: 0.000119 - momentum: 0.000000 2023-10-12 12:01:10,221 epoch 4 - iter 576/1445 - loss 0.04028670 - time (sec): 168.40 - samples/sec: 423.73 - lr: 0.000117 - momentum: 0.000000 2023-10-12 12:01:51,656 epoch 4 - iter 720/1445 - loss 0.03863318 - time (sec): 209.83 - samples/sec: 419.68 - lr: 0.000116 - momentum: 0.000000 2023-10-12 12:02:33,631 epoch 4 - iter 864/1445 - loss 0.04020086 - time (sec): 251.81 - samples/sec: 418.76 - lr: 0.000114 - momentum: 0.000000 2023-10-12 12:03:16,492 epoch 4 - iter 1008/1445 - loss 0.04062009 - time (sec): 294.67 - samples/sec: 421.08 - lr: 0.000112 - momentum: 0.000000 2023-10-12 12:03:57,431 epoch 4 - iter 1152/1445 - loss 0.04107089 - time (sec): 335.61 - samples/sec: 420.26 - lr: 0.000110 - momentum: 0.000000 2023-10-12 12:04:38,508 epoch 4 - iter 1296/1445 - loss 0.04130384 - time (sec): 376.69 - samples/sec: 420.79 - lr: 0.000109 - momentum: 0.000000 2023-10-12 12:05:20,346 epoch 4 - iter 1440/1445 - loss 0.04168080 - time (sec): 418.52 - samples/sec: 419.67 - lr: 0.000107 - momentum: 0.000000 2023-10-12 12:05:21,581 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:05:21,581 EPOCH 4 done: loss 0.0416 - lr: 0.000107 2023-10-12 12:05:42,143 DEV : loss 0.0841754898428917 - f1-score (micro avg) 0.8513 2023-10-12 12:05:42,174 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:06:22,264 epoch 5 - iter 144/1445 - loss 0.02316520 - time (sec): 40.09 - samples/sec: 423.34 - lr: 0.000105 - momentum: 0.000000 2023-10-12 12:07:02,997 epoch 5 - iter 288/1445 - loss 0.03441539 - time (sec): 80.82 - samples/sec: 421.93 - lr: 0.000103 - momentum: 0.000000 2023-10-12 12:07:43,904 epoch 5 - iter 432/1445 - loss 0.02995417 - time (sec): 121.73 - samples/sec: 429.49 - lr: 0.000101 - momentum: 0.000000 2023-10-12 12:08:24,479 epoch 5 - iter 576/1445 - loss 0.02814443 - time (sec): 162.30 - samples/sec: 432.65 - lr: 0.000100 - momentum: 0.000000 2023-10-12 12:09:06,881 epoch 5 - iter 720/1445 - loss 0.03032866 - time (sec): 204.71 - samples/sec: 433.00 - lr: 0.000098 - momentum: 0.000000 2023-10-12 12:09:47,170 epoch 5 - iter 864/1445 - loss 0.02919599 - time (sec): 244.99 - samples/sec: 432.35 - lr: 0.000096 - momentum: 0.000000 2023-10-12 12:10:27,981 epoch 5 - iter 1008/1445 - loss 0.03025635 - time (sec): 285.81 - samples/sec: 428.95 - lr: 0.000094 - momentum: 0.000000 2023-10-12 12:11:08,654 epoch 5 - iter 1152/1445 - loss 0.03157763 - time (sec): 326.48 - samples/sec: 429.25 - lr: 0.000093 - momentum: 0.000000 2023-10-12 12:11:48,918 epoch 5 - iter 1296/1445 - loss 0.03228696 - time (sec): 366.74 - samples/sec: 428.20 - lr: 0.000091 - momentum: 0.000000 2023-10-12 12:12:30,752 epoch 5 - iter 1440/1445 - loss 0.03243954 - time (sec): 408.58 - samples/sec: 429.94 - lr: 0.000089 - momentum: 0.000000 2023-10-12 12:12:31,986 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:12:31,986 EPOCH 5 done: loss 0.0325 - lr: 0.000089 2023-10-12 12:12:52,608 DEV : loss 0.08762915432453156 - f1-score (micro avg) 0.8626 2023-10-12 12:12:52,638 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:13:34,288 epoch 6 - iter 144/1445 - loss 0.01905912 - time (sec): 41.65 - samples/sec: 433.11 - lr: 0.000087 - momentum: 0.000000 2023-10-12 12:14:14,530 epoch 6 - iter 288/1445 - loss 0.01961712 - time (sec): 81.89 - samples/sec: 415.29 - lr: 0.000085 - momentum: 0.000000 2023-10-12 12:14:55,335 epoch 6 - iter 432/1445 - loss 0.02346195 - time (sec): 122.69 - samples/sec: 418.07 - lr: 0.000084 - momentum: 0.000000 2023-10-12 12:15:35,365 epoch 6 - iter 576/1445 - loss 0.02196985 - time (sec): 162.73 - samples/sec: 416.44 - lr: 0.000082 - momentum: 0.000000 2023-10-12 12:16:17,354 epoch 6 - iter 720/1445 - loss 0.02127544 - time (sec): 204.71 - samples/sec: 420.30 - lr: 0.000080 - momentum: 0.000000 2023-10-12 12:16:58,106 epoch 6 - iter 864/1445 - loss 0.02274393 - time (sec): 245.47 - samples/sec: 421.32 - lr: 0.000078 - momentum: 0.000000 2023-10-12 12:17:39,638 epoch 6 - iter 1008/1445 - loss 0.02145250 - time (sec): 287.00 - samples/sec: 421.43 - lr: 0.000076 - momentum: 0.000000 2023-10-12 12:18:21,358 epoch 6 - iter 1152/1445 - loss 0.02203351 - time (sec): 328.72 - samples/sec: 421.82 - lr: 0.000075 - momentum: 0.000000 2023-10-12 12:19:03,518 epoch 6 - iter 1296/1445 - loss 0.02304377 - time (sec): 370.88 - samples/sec: 423.33 - lr: 0.000073 - momentum: 0.000000 2023-10-12 12:19:45,617 epoch 6 - iter 1440/1445 - loss 0.02426706 - time (sec): 412.98 - samples/sec: 425.12 - lr: 0.000071 - momentum: 0.000000 2023-10-12 12:19:46,936 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:19:46,936 EPOCH 6 done: loss 0.0247 - lr: 0.000071 2023-10-12 12:20:08,430 DEV : loss 0.11867068707942963 - f1-score (micro avg) 0.8423 2023-10-12 12:20:08,462 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:20:50,385 epoch 7 - iter 144/1445 - loss 0.01822959 - time (sec): 41.92 - samples/sec: 445.21 - lr: 0.000069 - momentum: 0.000000 2023-10-12 12:21:30,323 epoch 7 - iter 288/1445 - loss 0.01430378 - time (sec): 81.86 - samples/sec: 426.40 - lr: 0.000068 - momentum: 0.000000 2023-10-12 12:22:11,357 epoch 7 - iter 432/1445 - loss 0.01508153 - time (sec): 122.89 - samples/sec: 431.82 - lr: 0.000066 - momentum: 0.000000 2023-10-12 12:22:52,130 epoch 7 - iter 576/1445 - loss 0.01561902 - time (sec): 163.67 - samples/sec: 437.17 - lr: 0.000064 - momentum: 0.000000 2023-10-12 12:23:32,708 epoch 7 - iter 720/1445 - loss 0.01633977 - time (sec): 204.24 - samples/sec: 437.46 - lr: 0.000062 - momentum: 0.000000 2023-10-12 12:24:12,371 epoch 7 - iter 864/1445 - loss 0.01585051 - time (sec): 243.91 - samples/sec: 433.71 - lr: 0.000060 - momentum: 0.000000 2023-10-12 12:24:52,329 epoch 7 - iter 1008/1445 - loss 0.01560449 - time (sec): 283.87 - samples/sec: 434.36 - lr: 0.000059 - momentum: 0.000000 2023-10-12 12:25:33,020 epoch 7 - iter 1152/1445 - loss 0.01779868 - time (sec): 324.56 - samples/sec: 434.03 - lr: 0.000057 - momentum: 0.000000 2023-10-12 12:26:12,901 epoch 7 - iter 1296/1445 - loss 0.01716531 - time (sec): 364.44 - samples/sec: 433.22 - lr: 0.000055 - momentum: 0.000000 2023-10-12 12:26:53,185 epoch 7 - iter 1440/1445 - loss 0.01660263 - time (sec): 404.72 - samples/sec: 434.02 - lr: 0.000053 - momentum: 0.000000 2023-10-12 12:26:54,373 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:26:54,373 EPOCH 7 done: loss 0.0166 - lr: 0.000053 2023-10-12 12:27:14,578 DEV : loss 0.12038738280534744 - f1-score (micro avg) 0.8482 2023-10-12 12:27:14,607 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:27:53,892 epoch 8 - iter 144/1445 - loss 0.00917679 - time (sec): 39.28 - samples/sec: 423.88 - lr: 0.000052 - momentum: 0.000000 2023-10-12 12:28:34,077 epoch 8 - iter 288/1445 - loss 0.01287327 - time (sec): 79.47 - samples/sec: 433.47 - lr: 0.000050 - momentum: 0.000000 2023-10-12 12:29:14,044 epoch 8 - iter 432/1445 - loss 0.01137302 - time (sec): 119.43 - samples/sec: 433.78 - lr: 0.000048 - momentum: 0.000000 2023-10-12 12:29:54,775 epoch 8 - iter 576/1445 - loss 0.01052418 - time (sec): 160.17 - samples/sec: 436.19 - lr: 0.000046 - momentum: 0.000000 2023-10-12 12:30:35,503 epoch 8 - iter 720/1445 - loss 0.01106236 - time (sec): 200.89 - samples/sec: 438.62 - lr: 0.000044 - momentum: 0.000000 2023-10-12 12:31:15,672 epoch 8 - iter 864/1445 - loss 0.01395461 - time (sec): 241.06 - samples/sec: 438.60 - lr: 0.000043 - momentum: 0.000000 2023-10-12 12:31:57,034 epoch 8 - iter 1008/1445 - loss 0.01302380 - time (sec): 282.42 - samples/sec: 439.36 - lr: 0.000041 - momentum: 0.000000 2023-10-12 12:32:37,657 epoch 8 - iter 1152/1445 - loss 0.01284954 - time (sec): 323.05 - samples/sec: 440.27 - lr: 0.000039 - momentum: 0.000000 2023-10-12 12:33:17,409 epoch 8 - iter 1296/1445 - loss 0.01297199 - time (sec): 362.80 - samples/sec: 439.50 - lr: 0.000037 - momentum: 0.000000 2023-10-12 12:33:57,344 epoch 8 - iter 1440/1445 - loss 0.01306419 - time (sec): 402.74 - samples/sec: 436.30 - lr: 0.000036 - momentum: 0.000000 2023-10-12 12:33:58,584 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:33:58,585 EPOCH 8 done: loss 0.0130 - lr: 0.000036 2023-10-12 12:34:19,442 DEV : loss 0.13867534697055817 - f1-score (micro avg) 0.8435 2023-10-12 12:34:19,475 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:35:05,270 epoch 9 - iter 144/1445 - loss 0.00897734 - time (sec): 45.79 - samples/sec: 375.80 - lr: 0.000034 - momentum: 0.000000 2023-10-12 12:35:50,048 epoch 9 - iter 288/1445 - loss 0.01021018 - time (sec): 90.57 - samples/sec: 372.20 - lr: 0.000032 - momentum: 0.000000 2023-10-12 12:36:35,460 epoch 9 - iter 432/1445 - loss 0.00892388 - time (sec): 135.98 - samples/sec: 386.97 - lr: 0.000030 - momentum: 0.000000 2023-10-12 12:37:18,301 epoch 9 - iter 576/1445 - loss 0.00833804 - time (sec): 178.82 - samples/sec: 391.55 - lr: 0.000028 - momentum: 0.000000 2023-10-12 12:38:03,929 epoch 9 - iter 720/1445 - loss 0.00822345 - time (sec): 224.45 - samples/sec: 388.88 - lr: 0.000027 - momentum: 0.000000 2023-10-12 12:38:48,602 epoch 9 - iter 864/1445 - loss 0.00846797 - time (sec): 269.12 - samples/sec: 389.55 - lr: 0.000025 - momentum: 0.000000 2023-10-12 12:39:33,687 epoch 9 - iter 1008/1445 - loss 0.00830592 - time (sec): 314.21 - samples/sec: 390.31 - lr: 0.000023 - momentum: 0.000000 2023-10-12 12:40:19,812 epoch 9 - iter 1152/1445 - loss 0.00832466 - time (sec): 360.34 - samples/sec: 387.69 - lr: 0.000021 - momentum: 0.000000 2023-10-12 12:41:04,957 epoch 9 - iter 1296/1445 - loss 0.00901099 - time (sec): 405.48 - samples/sec: 388.89 - lr: 0.000020 - momentum: 0.000000 2023-10-12 12:41:47,906 epoch 9 - iter 1440/1445 - loss 0.00909627 - time (sec): 448.43 - samples/sec: 391.21 - lr: 0.000018 - momentum: 0.000000 2023-10-12 12:41:49,371 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:41:49,372 EPOCH 9 done: loss 0.0091 - lr: 0.000018 2023-10-12 12:42:11,960 DEV : loss 0.1353680044412613 - f1-score (micro avg) 0.8498 2023-10-12 12:42:11,994 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:42:53,759 epoch 10 - iter 144/1445 - loss 0.00405138 - time (sec): 41.76 - samples/sec: 420.58 - lr: 0.000016 - momentum: 0.000000 2023-10-12 12:43:35,557 epoch 10 - iter 288/1445 - loss 0.00574470 - time (sec): 83.56 - samples/sec: 416.49 - lr: 0.000014 - momentum: 0.000000 2023-10-12 12:44:17,777 epoch 10 - iter 432/1445 - loss 0.00477015 - time (sec): 125.78 - samples/sec: 421.24 - lr: 0.000012 - momentum: 0.000000 2023-10-12 12:44:58,478 epoch 10 - iter 576/1445 - loss 0.00546641 - time (sec): 166.48 - samples/sec: 418.51 - lr: 0.000011 - momentum: 0.000000 2023-10-12 12:45:38,762 epoch 10 - iter 720/1445 - loss 0.00529420 - time (sec): 206.77 - samples/sec: 416.03 - lr: 0.000009 - momentum: 0.000000 2023-10-12 12:46:20,215 epoch 10 - iter 864/1445 - loss 0.00554832 - time (sec): 248.22 - samples/sec: 419.34 - lr: 0.000007 - momentum: 0.000000 2023-10-12 12:47:02,487 epoch 10 - iter 1008/1445 - loss 0.00514464 - time (sec): 290.49 - samples/sec: 421.56 - lr: 0.000005 - momentum: 0.000000 2023-10-12 12:47:43,554 epoch 10 - iter 1152/1445 - loss 0.00556748 - time (sec): 331.56 - samples/sec: 421.17 - lr: 0.000004 - momentum: 0.000000 2023-10-12 12:48:23,857 epoch 10 - iter 1296/1445 - loss 0.00520984 - time (sec): 371.86 - samples/sec: 423.11 - lr: 0.000002 - momentum: 0.000000 2023-10-12 12:49:05,075 epoch 10 - iter 1440/1445 - loss 0.00602242 - time (sec): 413.08 - samples/sec: 425.65 - lr: 0.000000 - momentum: 0.000000 2023-10-12 12:49:06,221 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:49:06,221 EPOCH 10 done: loss 0.0060 - lr: 0.000000 2023-10-12 12:49:27,492 DEV : loss 0.1435890942811966 - f1-score (micro avg) 0.8555 2023-10-12 12:49:28,414 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:49:28,416 Loading model from best epoch ... 2023-10-12 12:49:33,648 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 12:49:53,772 Results: - F-score (micro) 0.8494 - F-score (macro) 0.72 - Accuracy 0.7513 By class: precision recall f1-score support PER 0.8360 0.8672 0.8513 482 LOC 0.9120 0.8821 0.8968 458 ORG 0.6364 0.3043 0.4118 69 micro avg 0.8637 0.8355 0.8494 1009 macro avg 0.7948 0.6846 0.7200 1009 weighted avg 0.8568 0.8355 0.8419 1009 2023-10-12 12:49:53,772 ----------------------------------------------------------------------------------------------------