2023-10-12 23:34:06,681 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:34:06,683 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 23:34:06,683 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:34:06,683 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr 2023-10-12 23:34:06,683 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:34:06,683 Train: 7936 sentences 2023-10-12 23:34:06,683 (train_with_dev=False, train_with_test=False) 2023-10-12 23:34:06,684 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:34:06,684 Training Params: 2023-10-12 23:34:06,684 - learning_rate: "0.00015" 2023-10-12 23:34:06,684 - mini_batch_size: "4" 2023-10-12 23:34:06,684 - max_epochs: "10" 2023-10-12 23:34:06,684 - shuffle: "True" 2023-10-12 23:34:06,684 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:34:06,684 Plugins: 2023-10-12 23:34:06,684 - TensorboardLogger 2023-10-12 23:34:06,684 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 23:34:06,684 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:34:06,684 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 23:34:06,684 - metric: "('micro avg', 'f1-score')" 2023-10-12 23:34:06,684 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:34:06,685 Computation: 2023-10-12 23:34:06,685 - compute on device: cuda:0 2023-10-12 23:34:06,685 - embedding storage: none 2023-10-12 23:34:06,685 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:34:06,685 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3" 2023-10-12 23:34:06,685 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:34:06,685 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:34:06,685 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 23:35:01,912 epoch 1 - iter 198/1984 - loss 2.53015901 - time (sec): 55.23 - samples/sec: 290.88 - lr: 0.000015 - momentum: 0.000000 2023-10-12 23:35:58,680 epoch 1 - iter 396/1984 - loss 2.34890096 - time (sec): 111.99 - samples/sec: 283.70 - lr: 0.000030 - momentum: 0.000000 2023-10-12 23:36:54,445 epoch 1 - iter 594/1984 - loss 2.03523788 - time (sec): 167.76 - samples/sec: 290.57 - lr: 0.000045 - momentum: 0.000000 2023-10-12 23:37:51,846 epoch 1 - iter 792/1984 - loss 1.72538062 - time (sec): 225.16 - samples/sec: 289.82 - lr: 0.000060 - momentum: 0.000000 2023-10-12 23:38:49,794 epoch 1 - iter 990/1984 - loss 1.45920458 - time (sec): 283.11 - samples/sec: 290.02 - lr: 0.000075 - momentum: 0.000000 2023-10-12 23:39:44,998 epoch 1 - iter 1188/1984 - loss 1.25804478 - time (sec): 338.31 - samples/sec: 290.71 - lr: 0.000090 - momentum: 0.000000 2023-10-12 23:40:39,793 epoch 1 - iter 1386/1984 - loss 1.11947751 - time (sec): 393.11 - samples/sec: 290.06 - lr: 0.000105 - momentum: 0.000000 2023-10-12 23:41:33,442 epoch 1 - iter 1584/1984 - loss 1.00741428 - time (sec): 446.75 - samples/sec: 292.14 - lr: 0.000120 - momentum: 0.000000 2023-10-12 23:42:27,103 epoch 1 - iter 1782/1984 - loss 0.91439809 - time (sec): 500.42 - samples/sec: 294.68 - lr: 0.000135 - momentum: 0.000000 2023-10-12 23:43:21,922 epoch 1 - iter 1980/1984 - loss 0.84030395 - time (sec): 555.23 - samples/sec: 294.83 - lr: 0.000150 - momentum: 0.000000 2023-10-12 23:43:22,952 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:43:22,953 EPOCH 1 done: loss 0.8393 - lr: 0.000150 2023-10-12 23:43:49,764 DEV : loss 0.13014310598373413 - f1-score (micro avg) 0.649 2023-10-12 23:43:49,806 saving best model 2023-10-12 23:43:50,761 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:44:50,514 epoch 2 - iter 198/1984 - loss 0.13814942 - time (sec): 59.75 - samples/sec: 281.85 - lr: 0.000148 - momentum: 0.000000 2023-10-12 23:45:48,347 epoch 2 - iter 396/1984 - loss 0.13990687 - time (sec): 117.58 - samples/sec: 281.14 - lr: 0.000147 - momentum: 0.000000 2023-10-12 23:46:40,798 epoch 2 - iter 594/1984 - loss 0.13853170 - time (sec): 170.04 - samples/sec: 290.25 - lr: 0.000145 - momentum: 0.000000 2023-10-12 23:47:34,795 epoch 2 - iter 792/1984 - loss 0.13340291 - time (sec): 224.03 - samples/sec: 295.29 - lr: 0.000143 - momentum: 0.000000 2023-10-12 23:48:26,830 epoch 2 - iter 990/1984 - loss 0.13210679 - time (sec): 276.07 - samples/sec: 297.52 - lr: 0.000142 - momentum: 0.000000 2023-10-12 23:49:17,532 epoch 2 - iter 1188/1984 - loss 0.12961623 - time (sec): 326.77 - samples/sec: 303.19 - lr: 0.000140 - momentum: 0.000000 2023-10-12 23:50:11,071 epoch 2 - iter 1386/1984 - loss 0.12705124 - time (sec): 380.31 - samples/sec: 304.64 - lr: 0.000138 - momentum: 0.000000 2023-10-12 23:51:04,444 epoch 2 - iter 1584/1984 - loss 0.12561623 - time (sec): 433.68 - samples/sec: 303.29 - lr: 0.000137 - momentum: 0.000000 2023-10-12 23:52:01,829 epoch 2 - iter 1782/1984 - loss 0.12330233 - time (sec): 491.07 - samples/sec: 300.47 - lr: 0.000135 - momentum: 0.000000 2023-10-12 23:52:56,391 epoch 2 - iter 1980/1984 - loss 0.12223974 - time (sec): 545.63 - samples/sec: 300.10 - lr: 0.000133 - momentum: 0.000000 2023-10-12 23:52:57,454 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:52:57,455 EPOCH 2 done: loss 0.1222 - lr: 0.000133 2023-10-12 23:53:24,424 DEV : loss 0.08898138254880905 - f1-score (micro avg) 0.7624 2023-10-12 23:53:24,471 saving best model 2023-10-12 23:53:27,101 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:54:19,550 epoch 3 - iter 198/1984 - loss 0.06987952 - time (sec): 52.44 - samples/sec: 327.18 - lr: 0.000132 - momentum: 0.000000 2023-10-12 23:55:11,958 epoch 3 - iter 396/1984 - loss 0.07535791 - time (sec): 104.85 - samples/sec: 316.39 - lr: 0.000130 - momentum: 0.000000 2023-10-12 23:56:04,659 epoch 3 - iter 594/1984 - loss 0.07396086 - time (sec): 157.55 - samples/sec: 310.41 - lr: 0.000128 - momentum: 0.000000 2023-10-12 23:56:55,853 epoch 3 - iter 792/1984 - loss 0.07220600 - time (sec): 208.75 - samples/sec: 309.88 - lr: 0.000127 - momentum: 0.000000 2023-10-12 23:57:50,121 epoch 3 - iter 990/1984 - loss 0.07348138 - time (sec): 263.02 - samples/sec: 309.17 - lr: 0.000125 - momentum: 0.000000 2023-10-12 23:58:46,335 epoch 3 - iter 1188/1984 - loss 0.07496796 - time (sec): 319.23 - samples/sec: 305.52 - lr: 0.000123 - momentum: 0.000000 2023-10-12 23:59:41,068 epoch 3 - iter 1386/1984 - loss 0.07504507 - time (sec): 373.96 - samples/sec: 303.31 - lr: 0.000122 - momentum: 0.000000 2023-10-13 00:00:34,233 epoch 3 - iter 1584/1984 - loss 0.07483555 - time (sec): 427.13 - samples/sec: 304.06 - lr: 0.000120 - momentum: 0.000000 2023-10-13 00:01:26,703 epoch 3 - iter 1782/1984 - loss 0.07475606 - time (sec): 479.60 - samples/sec: 305.36 - lr: 0.000118 - momentum: 0.000000 2023-10-13 00:02:20,671 epoch 3 - iter 1980/1984 - loss 0.07567377 - time (sec): 533.57 - samples/sec: 306.94 - lr: 0.000117 - momentum: 0.000000 2023-10-13 00:02:21,678 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:02:21,678 EPOCH 3 done: loss 0.0756 - lr: 0.000117 2023-10-13 00:02:51,479 DEV : loss 0.09832499176263809 - f1-score (micro avg) 0.7645 2023-10-13 00:02:51,528 saving best model 2023-10-13 00:02:54,218 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:03:50,217 epoch 4 - iter 198/1984 - loss 0.05053718 - time (sec): 55.98 - samples/sec: 311.10 - lr: 0.000115 - momentum: 0.000000 2023-10-13 00:04:43,691 epoch 4 - iter 396/1984 - loss 0.04932322 - time (sec): 109.46 - samples/sec: 303.41 - lr: 0.000113 - momentum: 0.000000 2023-10-13 00:05:39,135 epoch 4 - iter 594/1984 - loss 0.05264591 - time (sec): 164.90 - samples/sec: 299.64 - lr: 0.000112 - momentum: 0.000000 2023-10-13 00:06:32,387 epoch 4 - iter 792/1984 - loss 0.05162108 - time (sec): 218.15 - samples/sec: 301.29 - lr: 0.000110 - momentum: 0.000000 2023-10-13 00:07:28,853 epoch 4 - iter 990/1984 - loss 0.05226222 - time (sec): 274.62 - samples/sec: 299.06 - lr: 0.000108 - momentum: 0.000000 2023-10-13 00:08:25,142 epoch 4 - iter 1188/1984 - loss 0.05307070 - time (sec): 330.91 - samples/sec: 296.59 - lr: 0.000107 - momentum: 0.000000 2023-10-13 00:09:20,493 epoch 4 - iter 1386/1984 - loss 0.05516288 - time (sec): 386.26 - samples/sec: 294.40 - lr: 0.000105 - momentum: 0.000000 2023-10-13 00:10:18,458 epoch 4 - iter 1584/1984 - loss 0.05582008 - time (sec): 444.22 - samples/sec: 294.05 - lr: 0.000103 - momentum: 0.000000 2023-10-13 00:11:16,349 epoch 4 - iter 1782/1984 - loss 0.05557762 - time (sec): 502.11 - samples/sec: 292.28 - lr: 0.000102 - momentum: 0.000000 2023-10-13 00:12:16,111 epoch 4 - iter 1980/1984 - loss 0.05644123 - time (sec): 561.88 - samples/sec: 291.31 - lr: 0.000100 - momentum: 0.000000 2023-10-13 00:12:17,223 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:12:17,223 EPOCH 4 done: loss 0.0564 - lr: 0.000100 2023-10-13 00:12:44,297 DEV : loss 0.11294437199831009 - f1-score (micro avg) 0.7682 2023-10-13 00:12:44,350 saving best model 2023-10-13 00:12:48,939 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:13:46,470 epoch 5 - iter 198/1984 - loss 0.04188904 - time (sec): 57.53 - samples/sec: 269.87 - lr: 0.000098 - momentum: 0.000000 2023-10-13 00:14:39,106 epoch 5 - iter 396/1984 - loss 0.03431986 - time (sec): 110.16 - samples/sec: 289.66 - lr: 0.000097 - momentum: 0.000000 2023-10-13 00:15:33,804 epoch 5 - iter 594/1984 - loss 0.03740529 - time (sec): 164.86 - samples/sec: 293.86 - lr: 0.000095 - momentum: 0.000000 2023-10-13 00:16:25,616 epoch 5 - iter 792/1984 - loss 0.04014471 - time (sec): 216.67 - samples/sec: 293.90 - lr: 0.000093 - momentum: 0.000000 2023-10-13 00:17:16,677 epoch 5 - iter 990/1984 - loss 0.03944146 - time (sec): 267.73 - samples/sec: 296.16 - lr: 0.000092 - momentum: 0.000000 2023-10-13 00:18:15,446 epoch 5 - iter 1188/1984 - loss 0.03862818 - time (sec): 326.50 - samples/sec: 293.74 - lr: 0.000090 - momentum: 0.000000 2023-10-13 00:19:11,566 epoch 5 - iter 1386/1984 - loss 0.03814287 - time (sec): 382.62 - samples/sec: 297.28 - lr: 0.000088 - momentum: 0.000000 2023-10-13 00:20:06,513 epoch 5 - iter 1584/1984 - loss 0.03809265 - time (sec): 437.57 - samples/sec: 297.88 - lr: 0.000087 - momentum: 0.000000 2023-10-13 00:21:02,477 epoch 5 - iter 1782/1984 - loss 0.04025913 - time (sec): 493.53 - samples/sec: 296.90 - lr: 0.000085 - momentum: 0.000000 2023-10-13 00:21:57,034 epoch 5 - iter 1980/1984 - loss 0.04036092 - time (sec): 548.09 - samples/sec: 298.60 - lr: 0.000083 - momentum: 0.000000 2023-10-13 00:21:58,131 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:21:58,131 EPOCH 5 done: loss 0.0403 - lr: 0.000083 2023-10-13 00:22:23,594 DEV : loss 0.13933853805065155 - f1-score (micro avg) 0.7693 2023-10-13 00:22:23,638 saving best model 2023-10-13 00:22:26,278 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:23:19,874 epoch 6 - iter 198/1984 - loss 0.03230988 - time (sec): 53.59 - samples/sec: 303.24 - lr: 0.000082 - momentum: 0.000000 2023-10-13 00:24:13,483 epoch 6 - iter 396/1984 - loss 0.03111950 - time (sec): 107.20 - samples/sec: 300.50 - lr: 0.000080 - momentum: 0.000000 2023-10-13 00:25:07,921 epoch 6 - iter 594/1984 - loss 0.02996729 - time (sec): 161.64 - samples/sec: 301.94 - lr: 0.000078 - momentum: 0.000000 2023-10-13 00:26:02,393 epoch 6 - iter 792/1984 - loss 0.03034434 - time (sec): 216.11 - samples/sec: 302.94 - lr: 0.000077 - momentum: 0.000000 2023-10-13 00:26:55,905 epoch 6 - iter 990/1984 - loss 0.03106029 - time (sec): 269.62 - samples/sec: 304.59 - lr: 0.000075 - momentum: 0.000000 2023-10-13 00:27:48,126 epoch 6 - iter 1188/1984 - loss 0.03134518 - time (sec): 321.84 - samples/sec: 304.45 - lr: 0.000073 - momentum: 0.000000 2023-10-13 00:28:40,096 epoch 6 - iter 1386/1984 - loss 0.03194870 - time (sec): 373.81 - samples/sec: 308.07 - lr: 0.000072 - momentum: 0.000000 2023-10-13 00:29:32,100 epoch 6 - iter 1584/1984 - loss 0.03096957 - time (sec): 425.82 - samples/sec: 309.33 - lr: 0.000070 - momentum: 0.000000 2023-10-13 00:30:26,165 epoch 6 - iter 1782/1984 - loss 0.03103819 - time (sec): 479.88 - samples/sec: 309.10 - lr: 0.000068 - momentum: 0.000000 2023-10-13 00:31:19,173 epoch 6 - iter 1980/1984 - loss 0.03055222 - time (sec): 532.89 - samples/sec: 307.32 - lr: 0.000067 - momentum: 0.000000 2023-10-13 00:31:20,256 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:31:20,257 EPOCH 6 done: loss 0.0305 - lr: 0.000067 2023-10-13 00:31:45,020 DEV : loss 0.17435909807682037 - f1-score (micro avg) 0.7553 2023-10-13 00:31:45,060 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:32:41,812 epoch 7 - iter 198/1984 - loss 0.01566928 - time (sec): 56.75 - samples/sec: 284.71 - lr: 0.000065 - momentum: 0.000000 2023-10-13 00:33:35,154 epoch 7 - iter 396/1984 - loss 0.01826435 - time (sec): 110.09 - samples/sec: 299.13 - lr: 0.000063 - momentum: 0.000000 2023-10-13 00:34:27,198 epoch 7 - iter 594/1984 - loss 0.01846479 - time (sec): 162.14 - samples/sec: 301.99 - lr: 0.000062 - momentum: 0.000000 2023-10-13 00:35:20,754 epoch 7 - iter 792/1984 - loss 0.02083283 - time (sec): 215.69 - samples/sec: 303.38 - lr: 0.000060 - momentum: 0.000000 2023-10-13 00:36:13,267 epoch 7 - iter 990/1984 - loss 0.02069876 - time (sec): 268.20 - samples/sec: 303.96 - lr: 0.000058 - momentum: 0.000000 2023-10-13 00:37:09,426 epoch 7 - iter 1188/1984 - loss 0.02203983 - time (sec): 324.36 - samples/sec: 303.26 - lr: 0.000057 - momentum: 0.000000 2023-10-13 00:38:08,612 epoch 7 - iter 1386/1984 - loss 0.02134873 - time (sec): 383.55 - samples/sec: 299.07 - lr: 0.000055 - momentum: 0.000000 2023-10-13 00:39:06,584 epoch 7 - iter 1584/1984 - loss 0.02128909 - time (sec): 441.52 - samples/sec: 297.30 - lr: 0.000053 - momentum: 0.000000 2023-10-13 00:39:59,346 epoch 7 - iter 1782/1984 - loss 0.02121560 - time (sec): 494.28 - samples/sec: 298.00 - lr: 0.000052 - momentum: 0.000000 2023-10-13 00:40:50,705 epoch 7 - iter 1980/1984 - loss 0.02256854 - time (sec): 545.64 - samples/sec: 300.01 - lr: 0.000050 - momentum: 0.000000 2023-10-13 00:40:51,754 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:40:51,754 EPOCH 7 done: loss 0.0226 - lr: 0.000050 2023-10-13 00:41:16,847 DEV : loss 0.1826089322566986 - f1-score (micro avg) 0.7686 2023-10-13 00:41:16,888 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:42:06,981 epoch 8 - iter 198/1984 - loss 0.01900992 - time (sec): 50.09 - samples/sec: 330.24 - lr: 0.000048 - momentum: 0.000000 2023-10-13 00:42:58,185 epoch 8 - iter 396/1984 - loss 0.01532470 - time (sec): 101.30 - samples/sec: 315.20 - lr: 0.000047 - momentum: 0.000000 2023-10-13 00:43:50,417 epoch 8 - iter 594/1984 - loss 0.01466769 - time (sec): 153.53 - samples/sec: 317.62 - lr: 0.000045 - momentum: 0.000000 2023-10-13 00:44:41,465 epoch 8 - iter 792/1984 - loss 0.01443231 - time (sec): 204.58 - samples/sec: 320.74 - lr: 0.000043 - momentum: 0.000000 2023-10-13 00:45:33,566 epoch 8 - iter 990/1984 - loss 0.01505665 - time (sec): 256.68 - samples/sec: 319.53 - lr: 0.000042 - momentum: 0.000000 2023-10-13 00:46:28,128 epoch 8 - iter 1188/1984 - loss 0.01537213 - time (sec): 311.24 - samples/sec: 315.24 - lr: 0.000040 - momentum: 0.000000 2023-10-13 00:47:23,918 epoch 8 - iter 1386/1984 - loss 0.01485116 - time (sec): 367.03 - samples/sec: 310.98 - lr: 0.000038 - momentum: 0.000000 2023-10-13 00:48:20,545 epoch 8 - iter 1584/1984 - loss 0.01459437 - time (sec): 423.66 - samples/sec: 309.24 - lr: 0.000037 - momentum: 0.000000 2023-10-13 00:49:17,222 epoch 8 - iter 1782/1984 - loss 0.01455471 - time (sec): 480.33 - samples/sec: 305.90 - lr: 0.000035 - momentum: 0.000000 2023-10-13 00:50:14,477 epoch 8 - iter 1980/1984 - loss 0.01511982 - time (sec): 537.59 - samples/sec: 304.37 - lr: 0.000033 - momentum: 0.000000 2023-10-13 00:50:15,631 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:50:15,632 EPOCH 8 done: loss 0.0151 - lr: 0.000033 2023-10-13 00:50:43,313 DEV : loss 0.20662663877010345 - f1-score (micro avg) 0.7615 2023-10-13 00:50:43,359 ---------------------------------------------------------------------------------------------------- 2023-10-13 00:51:38,783 epoch 9 - iter 198/1984 - loss 0.00837254 - time (sec): 55.42 - samples/sec: 278.95 - lr: 0.000032 - momentum: 0.000000 2023-10-13 00:52:35,124 epoch 9 - iter 396/1984 - loss 0.00689394 - time (sec): 111.76 - samples/sec: 276.25 - lr: 0.000030 - momentum: 0.000000 2023-10-13 00:53:32,764 epoch 9 - iter 594/1984 - loss 0.00893818 - time (sec): 169.40 - samples/sec: 279.02 - lr: 0.000028 - momentum: 0.000000 2023-10-13 00:54:29,865 epoch 9 - iter 792/1984 - loss 0.00991164 - time (sec): 226.50 - samples/sec: 283.87 - lr: 0.000027 - momentum: 0.000000 2023-10-13 00:55:23,498 epoch 9 - iter 990/1984 - loss 0.00966373 - time (sec): 280.14 - samples/sec: 288.71 - lr: 0.000025 - momentum: 0.000000 2023-10-13 00:56:17,879 epoch 9 - iter 1188/1984 - loss 0.01070039 - time (sec): 334.52 - samples/sec: 294.73 - lr: 0.000023 - momentum: 0.000000 2023-10-13 00:57:12,239 epoch 9 - iter 1386/1984 - loss 0.01086709 - time (sec): 388.88 - samples/sec: 296.87 - lr: 0.000022 - momentum: 0.000000 2023-10-13 00:58:09,951 epoch 9 - iter 1584/1984 - loss 0.01130188 - time (sec): 446.59 - samples/sec: 295.83 - lr: 0.000020 - momentum: 0.000000 2023-10-13 00:59:07,871 epoch 9 - iter 1782/1984 - loss 0.01116529 - time (sec): 504.51 - samples/sec: 294.52 - lr: 0.000018 - momentum: 0.000000 2023-10-13 01:00:02,143 epoch 9 - iter 1980/1984 - loss 0.01105769 - time (sec): 558.78 - samples/sec: 292.84 - lr: 0.000017 - momentum: 0.000000 2023-10-13 01:00:03,328 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:00:03,329 EPOCH 9 done: loss 0.0110 - lr: 0.000017 2023-10-13 01:00:29,444 DEV : loss 0.21558064222335815 - f1-score (micro avg) 0.7598 2023-10-13 01:00:29,495 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:01:24,958 epoch 10 - iter 198/1984 - loss 0.00840029 - time (sec): 55.46 - samples/sec: 297.49 - lr: 0.000015 - momentum: 0.000000 2023-10-13 01:02:18,131 epoch 10 - iter 396/1984 - loss 0.00784138 - time (sec): 108.63 - samples/sec: 305.89 - lr: 0.000013 - momentum: 0.000000 2023-10-13 01:03:11,651 epoch 10 - iter 594/1984 - loss 0.00781192 - time (sec): 162.15 - samples/sec: 309.00 - lr: 0.000012 - momentum: 0.000000 2023-10-13 01:04:04,787 epoch 10 - iter 792/1984 - loss 0.00716612 - time (sec): 215.29 - samples/sec: 307.05 - lr: 0.000010 - momentum: 0.000000 2023-10-13 01:04:57,460 epoch 10 - iter 990/1984 - loss 0.00758087 - time (sec): 267.96 - samples/sec: 307.57 - lr: 0.000008 - momentum: 0.000000 2023-10-13 01:05:50,194 epoch 10 - iter 1188/1984 - loss 0.00804570 - time (sec): 320.70 - samples/sec: 306.54 - lr: 0.000007 - momentum: 0.000000 2023-10-13 01:06:43,467 epoch 10 - iter 1386/1984 - loss 0.00871718 - time (sec): 373.97 - samples/sec: 307.56 - lr: 0.000005 - momentum: 0.000000 2023-10-13 01:07:37,058 epoch 10 - iter 1584/1984 - loss 0.00857980 - time (sec): 427.56 - samples/sec: 306.00 - lr: 0.000003 - momentum: 0.000000 2023-10-13 01:08:29,889 epoch 10 - iter 1782/1984 - loss 0.00842854 - time (sec): 480.39 - samples/sec: 307.67 - lr: 0.000002 - momentum: 0.000000 2023-10-13 01:09:23,621 epoch 10 - iter 1980/1984 - loss 0.00882397 - time (sec): 534.12 - samples/sec: 306.45 - lr: 0.000000 - momentum: 0.000000 2023-10-13 01:09:24,699 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:09:24,699 EPOCH 10 done: loss 0.0088 - lr: 0.000000 2023-10-13 01:09:50,892 DEV : loss 0.22039946913719177 - f1-score (micro avg) 0.7608 2023-10-13 01:09:51,900 ---------------------------------------------------------------------------------------------------- 2023-10-13 01:09:51,902 Loading model from best epoch ... 2023-10-13 01:09:57,183 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-13 01:10:22,775 Results: - F-score (micro) 0.7535 - F-score (macro) 0.6657 - Accuracy 0.6291 By class: precision recall f1-score support LOC 0.7982 0.8336 0.8155 655 PER 0.6917 0.7848 0.7353 223 ORG 0.4696 0.4252 0.4463 127 micro avg 0.7367 0.7711 0.7535 1005 macro avg 0.6532 0.6812 0.6657 1005 weighted avg 0.7331 0.7711 0.7511 1005 2023-10-13 01:10:22,775 ----------------------------------------------------------------------------------------------------