2023-10-12 19:00:57,401 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:00:57,403 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 19:00:57,404 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:00:57,404 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr 2023-10-12 19:00:57,404 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:00:57,404 Train: 7936 sentences 2023-10-12 19:00:57,404 (train_with_dev=False, train_with_test=False) 2023-10-12 19:00:57,404 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:00:57,404 Training Params: 2023-10-12 19:00:57,404 - learning_rate: "0.00016" 2023-10-12 19:00:57,404 - mini_batch_size: "4" 2023-10-12 19:00:57,404 - max_epochs: "10" 2023-10-12 19:00:57,404 - shuffle: "True" 2023-10-12 19:00:57,405 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:00:57,405 Plugins: 2023-10-12 19:00:57,405 - TensorboardLogger 2023-10-12 19:00:57,405 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 19:00:57,405 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:00:57,405 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 19:00:57,405 - metric: "('micro avg', 'f1-score')" 2023-10-12 19:00:57,405 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:00:57,405 Computation: 2023-10-12 19:00:57,405 - compute on device: cuda:0 2023-10-12 19:00:57,405 - embedding storage: none 2023-10-12 19:00:57,405 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:00:57,405 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2" 2023-10-12 19:00:57,405 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:00:57,405 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:00:57,406 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 19:01:53,032 epoch 1 - iter 198/1984 - loss 2.55384183 - time (sec): 55.62 - samples/sec: 295.86 - lr: 0.000016 - momentum: 0.000000 2023-10-12 19:02:44,574 epoch 1 - iter 396/1984 - loss 2.39375294 - time (sec): 107.17 - samples/sec: 291.60 - lr: 0.000032 - momentum: 0.000000 2023-10-12 19:03:40,427 epoch 1 - iter 594/1984 - loss 2.05164311 - time (sec): 163.02 - samples/sec: 294.34 - lr: 0.000048 - momentum: 0.000000 2023-10-12 19:04:37,747 epoch 1 - iter 792/1984 - loss 1.72186761 - time (sec): 220.34 - samples/sec: 289.42 - lr: 0.000064 - momentum: 0.000000 2023-10-12 19:05:34,036 epoch 1 - iter 990/1984 - loss 1.45322435 - time (sec): 276.63 - samples/sec: 290.34 - lr: 0.000080 - momentum: 0.000000 2023-10-12 19:06:27,564 epoch 1 - iter 1188/1984 - loss 1.25262846 - time (sec): 330.16 - samples/sec: 291.81 - lr: 0.000096 - momentum: 0.000000 2023-10-12 19:07:20,760 epoch 1 - iter 1386/1984 - loss 1.10103908 - time (sec): 383.35 - samples/sec: 297.25 - lr: 0.000112 - momentum: 0.000000 2023-10-12 19:08:17,146 epoch 1 - iter 1584/1984 - loss 0.98576021 - time (sec): 439.74 - samples/sec: 297.66 - lr: 0.000128 - momentum: 0.000000 2023-10-12 19:09:09,704 epoch 1 - iter 1782/1984 - loss 0.90059444 - time (sec): 492.30 - samples/sec: 298.53 - lr: 0.000144 - momentum: 0.000000 2023-10-12 19:10:05,390 epoch 1 - iter 1980/1984 - loss 0.82654492 - time (sec): 547.98 - samples/sec: 298.76 - lr: 0.000160 - momentum: 0.000000 2023-10-12 19:10:06,355 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:10:06,355 EPOCH 1 done: loss 0.8253 - lr: 0.000160 2023-10-12 19:10:30,967 DEV : loss 0.14265914261341095 - f1-score (micro avg) 0.6151 2023-10-12 19:10:31,007 saving best model 2023-10-12 19:10:32,014 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:11:25,460 epoch 2 - iter 198/1984 - loss 0.18754969 - time (sec): 53.44 - samples/sec: 310.59 - lr: 0.000158 - momentum: 0.000000 2023-10-12 19:12:20,355 epoch 2 - iter 396/1984 - loss 0.16726537 - time (sec): 108.34 - samples/sec: 301.31 - lr: 0.000156 - momentum: 0.000000 2023-10-12 19:13:15,411 epoch 2 - iter 594/1984 - loss 0.16088723 - time (sec): 163.39 - samples/sec: 300.64 - lr: 0.000155 - momentum: 0.000000 2023-10-12 19:14:08,749 epoch 2 - iter 792/1984 - loss 0.15531591 - time (sec): 216.73 - samples/sec: 301.48 - lr: 0.000153 - momentum: 0.000000 2023-10-12 19:15:02,399 epoch 2 - iter 990/1984 - loss 0.14638572 - time (sec): 270.38 - samples/sec: 305.32 - lr: 0.000151 - momentum: 0.000000 2023-10-12 19:15:57,968 epoch 2 - iter 1188/1984 - loss 0.14040000 - time (sec): 325.95 - samples/sec: 302.90 - lr: 0.000149 - momentum: 0.000000 2023-10-12 19:16:51,518 epoch 2 - iter 1386/1984 - loss 0.13679138 - time (sec): 379.50 - samples/sec: 301.31 - lr: 0.000148 - momentum: 0.000000 2023-10-12 19:17:50,312 epoch 2 - iter 1584/1984 - loss 0.13171808 - time (sec): 438.30 - samples/sec: 300.88 - lr: 0.000146 - momentum: 0.000000 2023-10-12 19:18:51,244 epoch 2 - iter 1782/1984 - loss 0.12873992 - time (sec): 499.23 - samples/sec: 297.85 - lr: 0.000144 - momentum: 0.000000 2023-10-12 19:19:47,649 epoch 2 - iter 1980/1984 - loss 0.12676673 - time (sec): 555.63 - samples/sec: 294.68 - lr: 0.000142 - momentum: 0.000000 2023-10-12 19:19:48,734 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:19:48,735 EPOCH 2 done: loss 0.1267 - lr: 0.000142 2023-10-12 19:20:16,210 DEV : loss 0.08132906258106232 - f1-score (micro avg) 0.7426 2023-10-12 19:20:16,255 saving best model 2023-10-12 19:20:17,361 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:21:12,446 epoch 3 - iter 198/1984 - loss 0.07449102 - time (sec): 55.08 - samples/sec: 317.87 - lr: 0.000140 - momentum: 0.000000 2023-10-12 19:22:04,013 epoch 3 - iter 396/1984 - loss 0.07962083 - time (sec): 106.65 - samples/sec: 329.86 - lr: 0.000139 - momentum: 0.000000 2023-10-12 19:22:55,004 epoch 3 - iter 594/1984 - loss 0.08138074 - time (sec): 157.64 - samples/sec: 322.56 - lr: 0.000137 - momentum: 0.000000 2023-10-12 19:23:47,440 epoch 3 - iter 792/1984 - loss 0.08231929 - time (sec): 210.08 - samples/sec: 318.80 - lr: 0.000135 - momentum: 0.000000 2023-10-12 19:24:42,167 epoch 3 - iter 990/1984 - loss 0.08016426 - time (sec): 264.80 - samples/sec: 313.76 - lr: 0.000133 - momentum: 0.000000 2023-10-12 19:25:37,415 epoch 3 - iter 1188/1984 - loss 0.08037812 - time (sec): 320.05 - samples/sec: 308.93 - lr: 0.000132 - momentum: 0.000000 2023-10-12 19:26:32,104 epoch 3 - iter 1386/1984 - loss 0.07980738 - time (sec): 374.74 - samples/sec: 309.02 - lr: 0.000130 - momentum: 0.000000 2023-10-12 19:27:25,825 epoch 3 - iter 1584/1984 - loss 0.07722174 - time (sec): 428.46 - samples/sec: 310.33 - lr: 0.000128 - momentum: 0.000000 2023-10-12 19:28:18,961 epoch 3 - iter 1782/1984 - loss 0.07700187 - time (sec): 481.60 - samples/sec: 308.23 - lr: 0.000126 - momentum: 0.000000 2023-10-12 19:29:11,809 epoch 3 - iter 1980/1984 - loss 0.07753796 - time (sec): 534.45 - samples/sec: 306.20 - lr: 0.000125 - momentum: 0.000000 2023-10-12 19:29:12,872 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:29:12,872 EPOCH 3 done: loss 0.0775 - lr: 0.000125 2023-10-12 19:29:40,957 DEV : loss 0.10928809642791748 - f1-score (micro avg) 0.7465 2023-10-12 19:29:41,014 saving best model 2023-10-12 19:29:43,707 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:30:37,482 epoch 4 - iter 198/1984 - loss 0.04403286 - time (sec): 53.77 - samples/sec: 305.04 - lr: 0.000123 - momentum: 0.000000 2023-10-12 19:31:29,499 epoch 4 - iter 396/1984 - loss 0.05589690 - time (sec): 105.79 - samples/sec: 312.69 - lr: 0.000121 - momentum: 0.000000 2023-10-12 19:32:23,922 epoch 4 - iter 594/1984 - loss 0.05486852 - time (sec): 160.21 - samples/sec: 312.56 - lr: 0.000119 - momentum: 0.000000 2023-10-12 19:33:18,352 epoch 4 - iter 792/1984 - loss 0.05610260 - time (sec): 214.64 - samples/sec: 306.34 - lr: 0.000117 - momentum: 0.000000 2023-10-12 19:34:18,211 epoch 4 - iter 990/1984 - loss 0.05666238 - time (sec): 274.50 - samples/sec: 303.45 - lr: 0.000116 - momentum: 0.000000 2023-10-12 19:35:15,374 epoch 4 - iter 1188/1984 - loss 0.05516588 - time (sec): 331.66 - samples/sec: 302.13 - lr: 0.000114 - momentum: 0.000000 2023-10-12 19:36:05,202 epoch 4 - iter 1386/1984 - loss 0.05539758 - time (sec): 381.49 - samples/sec: 303.24 - lr: 0.000112 - momentum: 0.000000 2023-10-12 19:36:58,732 epoch 4 - iter 1584/1984 - loss 0.05517488 - time (sec): 435.02 - samples/sec: 304.46 - lr: 0.000110 - momentum: 0.000000 2023-10-12 19:37:55,195 epoch 4 - iter 1782/1984 - loss 0.05601570 - time (sec): 491.48 - samples/sec: 300.64 - lr: 0.000109 - momentum: 0.000000 2023-10-12 19:38:47,751 epoch 4 - iter 1980/1984 - loss 0.05588298 - time (sec): 544.04 - samples/sec: 300.62 - lr: 0.000107 - momentum: 0.000000 2023-10-12 19:38:48,991 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:38:48,991 EPOCH 4 done: loss 0.0559 - lr: 0.000107 2023-10-12 19:39:15,189 DEV : loss 0.12565550208091736 - f1-score (micro avg) 0.7575 2023-10-12 19:39:15,241 saving best model 2023-10-12 19:39:17,961 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:40:14,338 epoch 5 - iter 198/1984 - loss 0.03647781 - time (sec): 56.37 - samples/sec: 290.30 - lr: 0.000105 - momentum: 0.000000 2023-10-12 19:41:10,853 epoch 5 - iter 396/1984 - loss 0.03380893 - time (sec): 112.89 - samples/sec: 289.98 - lr: 0.000103 - momentum: 0.000000 2023-10-12 19:42:08,462 epoch 5 - iter 594/1984 - loss 0.03645011 - time (sec): 170.50 - samples/sec: 283.83 - lr: 0.000101 - momentum: 0.000000 2023-10-12 19:43:06,796 epoch 5 - iter 792/1984 - loss 0.03894979 - time (sec): 228.83 - samples/sec: 282.80 - lr: 0.000100 - momentum: 0.000000 2023-10-12 19:44:00,564 epoch 5 - iter 990/1984 - loss 0.03738215 - time (sec): 282.60 - samples/sec: 285.92 - lr: 0.000098 - momentum: 0.000000 2023-10-12 19:44:55,340 epoch 5 - iter 1188/1984 - loss 0.03883414 - time (sec): 337.38 - samples/sec: 288.91 - lr: 0.000096 - momentum: 0.000000 2023-10-12 19:45:49,221 epoch 5 - iter 1386/1984 - loss 0.04037811 - time (sec): 391.26 - samples/sec: 292.23 - lr: 0.000094 - momentum: 0.000000 2023-10-12 19:46:43,552 epoch 5 - iter 1584/1984 - loss 0.04090481 - time (sec): 445.59 - samples/sec: 293.44 - lr: 0.000093 - momentum: 0.000000 2023-10-12 19:47:36,831 epoch 5 - iter 1782/1984 - loss 0.04045048 - time (sec): 498.87 - samples/sec: 295.97 - lr: 0.000091 - momentum: 0.000000 2023-10-12 19:48:28,508 epoch 5 - iter 1980/1984 - loss 0.04140069 - time (sec): 550.54 - samples/sec: 297.43 - lr: 0.000089 - momentum: 0.000000 2023-10-12 19:48:29,506 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:48:29,506 EPOCH 5 done: loss 0.0413 - lr: 0.000089 2023-10-12 19:48:55,620 DEV : loss 0.1601100116968155 - f1-score (micro avg) 0.7367 2023-10-12 19:48:55,665 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:49:47,582 epoch 6 - iter 198/1984 - loss 0.03401114 - time (sec): 51.91 - samples/sec: 318.97 - lr: 0.000087 - momentum: 0.000000 2023-10-12 19:50:43,009 epoch 6 - iter 396/1984 - loss 0.02751547 - time (sec): 107.34 - samples/sec: 304.95 - lr: 0.000085 - momentum: 0.000000 2023-10-12 19:51:37,396 epoch 6 - iter 594/1984 - loss 0.02906753 - time (sec): 161.73 - samples/sec: 303.96 - lr: 0.000084 - momentum: 0.000000 2023-10-12 19:52:30,042 epoch 6 - iter 792/1984 - loss 0.02967165 - time (sec): 214.37 - samples/sec: 306.24 - lr: 0.000082 - momentum: 0.000000 2023-10-12 19:53:24,213 epoch 6 - iter 990/1984 - loss 0.03001808 - time (sec): 268.55 - samples/sec: 306.77 - lr: 0.000080 - momentum: 0.000000 2023-10-12 19:54:17,181 epoch 6 - iter 1188/1984 - loss 0.03015398 - time (sec): 321.51 - samples/sec: 305.20 - lr: 0.000078 - momentum: 0.000000 2023-10-12 19:55:10,010 epoch 6 - iter 1386/1984 - loss 0.03061750 - time (sec): 374.34 - samples/sec: 306.12 - lr: 0.000077 - momentum: 0.000000 2023-10-12 19:56:01,949 epoch 6 - iter 1584/1984 - loss 0.03146939 - time (sec): 426.28 - samples/sec: 307.08 - lr: 0.000075 - momentum: 0.000000 2023-10-12 19:56:53,654 epoch 6 - iter 1782/1984 - loss 0.03226077 - time (sec): 477.99 - samples/sec: 308.02 - lr: 0.000073 - momentum: 0.000000 2023-10-12 19:57:46,674 epoch 6 - iter 1980/1984 - loss 0.03254906 - time (sec): 531.01 - samples/sec: 308.10 - lr: 0.000071 - momentum: 0.000000 2023-10-12 19:57:47,712 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:57:47,712 EPOCH 6 done: loss 0.0325 - lr: 0.000071 2023-10-12 19:58:17,266 DEV : loss 0.1930389255285263 - f1-score (micro avg) 0.7462 2023-10-12 19:58:17,311 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:59:09,963 epoch 7 - iter 198/1984 - loss 0.01620639 - time (sec): 52.65 - samples/sec: 309.97 - lr: 0.000069 - momentum: 0.000000 2023-10-12 20:00:04,735 epoch 7 - iter 396/1984 - loss 0.02000413 - time (sec): 107.42 - samples/sec: 313.43 - lr: 0.000068 - momentum: 0.000000 2023-10-12 20:00:56,852 epoch 7 - iter 594/1984 - loss 0.01982571 - time (sec): 159.54 - samples/sec: 308.48 - lr: 0.000066 - momentum: 0.000000 2023-10-12 20:01:47,960 epoch 7 - iter 792/1984 - loss 0.01977200 - time (sec): 210.65 - samples/sec: 305.30 - lr: 0.000064 - momentum: 0.000000 2023-10-12 20:02:37,963 epoch 7 - iter 990/1984 - loss 0.02022846 - time (sec): 260.65 - samples/sec: 310.92 - lr: 0.000062 - momentum: 0.000000 2023-10-12 20:03:28,853 epoch 7 - iter 1188/1984 - loss 0.02170143 - time (sec): 311.54 - samples/sec: 312.41 - lr: 0.000061 - momentum: 0.000000 2023-10-12 20:04:19,531 epoch 7 - iter 1386/1984 - loss 0.02149673 - time (sec): 362.22 - samples/sec: 313.68 - lr: 0.000059 - momentum: 0.000000 2023-10-12 20:05:10,665 epoch 7 - iter 1584/1984 - loss 0.02222894 - time (sec): 413.35 - samples/sec: 314.55 - lr: 0.000057 - momentum: 0.000000 2023-10-12 20:06:04,251 epoch 7 - iter 1782/1984 - loss 0.02255477 - time (sec): 466.94 - samples/sec: 315.46 - lr: 0.000055 - momentum: 0.000000 2023-10-12 20:06:56,740 epoch 7 - iter 1980/1984 - loss 0.02284329 - time (sec): 519.43 - samples/sec: 314.80 - lr: 0.000053 - momentum: 0.000000 2023-10-12 20:06:57,943 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:06:57,943 EPOCH 7 done: loss 0.0228 - lr: 0.000053 2023-10-12 20:07:27,674 DEV : loss 0.20643900334835052 - f1-score (micro avg) 0.7405 2023-10-12 20:07:27,720 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:08:21,358 epoch 8 - iter 198/1984 - loss 0.01332593 - time (sec): 53.64 - samples/sec: 308.79 - lr: 0.000052 - momentum: 0.000000 2023-10-12 20:09:15,528 epoch 8 - iter 396/1984 - loss 0.01653972 - time (sec): 107.81 - samples/sec: 312.45 - lr: 0.000050 - momentum: 0.000000 2023-10-12 20:10:12,173 epoch 8 - iter 594/1984 - loss 0.01633359 - time (sec): 164.45 - samples/sec: 314.20 - lr: 0.000048 - momentum: 0.000000 2023-10-12 20:11:07,386 epoch 8 - iter 792/1984 - loss 0.01568000 - time (sec): 219.66 - samples/sec: 307.02 - lr: 0.000046 - momentum: 0.000000 2023-10-12 20:11:59,803 epoch 8 - iter 990/1984 - loss 0.01523239 - time (sec): 272.08 - samples/sec: 305.62 - lr: 0.000045 - momentum: 0.000000 2023-10-12 20:12:51,617 epoch 8 - iter 1188/1984 - loss 0.01477382 - time (sec): 323.89 - samples/sec: 304.79 - lr: 0.000043 - momentum: 0.000000 2023-10-12 20:13:43,396 epoch 8 - iter 1386/1984 - loss 0.01582612 - time (sec): 375.67 - samples/sec: 304.26 - lr: 0.000041 - momentum: 0.000000 2023-10-12 20:14:35,989 epoch 8 - iter 1584/1984 - loss 0.01504720 - time (sec): 428.27 - samples/sec: 304.56 - lr: 0.000039 - momentum: 0.000000 2023-10-12 20:15:27,097 epoch 8 - iter 1782/1984 - loss 0.01541399 - time (sec): 479.38 - samples/sec: 306.65 - lr: 0.000037 - momentum: 0.000000 2023-10-12 20:16:18,793 epoch 8 - iter 1980/1984 - loss 0.01658921 - time (sec): 531.07 - samples/sec: 307.94 - lr: 0.000036 - momentum: 0.000000 2023-10-12 20:16:19,919 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:16:19,920 EPOCH 8 done: loss 0.0165 - lr: 0.000036 2023-10-12 20:16:45,163 DEV : loss 0.2185012847185135 - f1-score (micro avg) 0.739 2023-10-12 20:16:45,210 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:17:37,408 epoch 9 - iter 198/1984 - loss 0.01139517 - time (sec): 52.20 - samples/sec: 312.89 - lr: 0.000034 - momentum: 0.000000 2023-10-12 20:18:28,365 epoch 9 - iter 396/1984 - loss 0.01073662 - time (sec): 103.15 - samples/sec: 317.57 - lr: 0.000032 - momentum: 0.000000 2023-10-12 20:19:20,735 epoch 9 - iter 594/1984 - loss 0.01104567 - time (sec): 155.52 - samples/sec: 318.44 - lr: 0.000030 - momentum: 0.000000 2023-10-12 20:20:13,061 epoch 9 - iter 792/1984 - loss 0.01065521 - time (sec): 207.85 - samples/sec: 317.40 - lr: 0.000029 - momentum: 0.000000 2023-10-12 20:21:04,694 epoch 9 - iter 990/1984 - loss 0.00973307 - time (sec): 259.48 - samples/sec: 321.20 - lr: 0.000027 - momentum: 0.000000 2023-10-12 20:21:55,918 epoch 9 - iter 1188/1984 - loss 0.01093970 - time (sec): 310.71 - samples/sec: 321.30 - lr: 0.000025 - momentum: 0.000000 2023-10-12 20:22:52,718 epoch 9 - iter 1386/1984 - loss 0.01065797 - time (sec): 367.51 - samples/sec: 316.15 - lr: 0.000023 - momentum: 0.000000 2023-10-12 20:23:48,032 epoch 9 - iter 1584/1984 - loss 0.01050642 - time (sec): 422.82 - samples/sec: 313.29 - lr: 0.000021 - momentum: 0.000000 2023-10-12 20:24:43,696 epoch 9 - iter 1782/1984 - loss 0.01110868 - time (sec): 478.48 - samples/sec: 309.19 - lr: 0.000020 - momentum: 0.000000 2023-10-12 20:25:37,319 epoch 9 - iter 1980/1984 - loss 0.01154089 - time (sec): 532.11 - samples/sec: 307.32 - lr: 0.000018 - momentum: 0.000000 2023-10-12 20:25:38,506 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:25:38,507 EPOCH 9 done: loss 0.0116 - lr: 0.000018 2023-10-12 20:26:03,810 DEV : loss 0.22808362543582916 - f1-score (micro avg) 0.7424 2023-10-12 20:26:03,850 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:26:57,205 epoch 10 - iter 198/1984 - loss 0.00807552 - time (sec): 53.35 - samples/sec: 314.51 - lr: 0.000016 - momentum: 0.000000 2023-10-12 20:27:49,711 epoch 10 - iter 396/1984 - loss 0.00665812 - time (sec): 105.86 - samples/sec: 306.10 - lr: 0.000014 - momentum: 0.000000 2023-10-12 20:28:43,374 epoch 10 - iter 594/1984 - loss 0.00732649 - time (sec): 159.52 - samples/sec: 301.75 - lr: 0.000013 - momentum: 0.000000 2023-10-12 20:29:40,407 epoch 10 - iter 792/1984 - loss 0.00720155 - time (sec): 216.55 - samples/sec: 298.64 - lr: 0.000011 - momentum: 0.000000 2023-10-12 20:30:34,085 epoch 10 - iter 990/1984 - loss 0.00834082 - time (sec): 270.23 - samples/sec: 304.15 - lr: 0.000009 - momentum: 0.000000 2023-10-12 20:31:27,736 epoch 10 - iter 1188/1984 - loss 0.00798658 - time (sec): 323.88 - samples/sec: 305.38 - lr: 0.000007 - momentum: 0.000000 2023-10-12 20:32:17,599 epoch 10 - iter 1386/1984 - loss 0.00810315 - time (sec): 373.75 - samples/sec: 307.08 - lr: 0.000005 - momentum: 0.000000 2023-10-12 20:33:10,059 epoch 10 - iter 1584/1984 - loss 0.00817232 - time (sec): 426.21 - samples/sec: 307.02 - lr: 0.000004 - momentum: 0.000000 2023-10-12 20:34:03,245 epoch 10 - iter 1782/1984 - loss 0.00778217 - time (sec): 479.39 - samples/sec: 307.53 - lr: 0.000002 - momentum: 0.000000 2023-10-12 20:34:56,578 epoch 10 - iter 1980/1984 - loss 0.00806001 - time (sec): 532.73 - samples/sec: 307.36 - lr: 0.000000 - momentum: 0.000000 2023-10-12 20:34:57,565 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:34:57,566 EPOCH 10 done: loss 0.0081 - lr: 0.000000 2023-10-12 20:35:23,351 DEV : loss 0.23574452102184296 - f1-score (micro avg) 0.7436 2023-10-12 20:35:24,353 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:35:24,355 Loading model from best epoch ... 2023-10-12 20:35:27,864 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 20:35:52,038 Results: - F-score (micro) 0.771 - F-score (macro) 0.6885 - Accuracy 0.6516 By class: precision recall f1-score support LOC 0.8006 0.8580 0.8283 655 PER 0.7215 0.7668 0.7435 223 ORG 0.5370 0.4567 0.4936 127 micro avg 0.7555 0.7871 0.7710 1005 macro avg 0.6864 0.6938 0.6885 1005 weighted avg 0.7497 0.7871 0.7672 1005 2023-10-12 20:35:52,038 ----------------------------------------------------------------------------------------------------