2023-10-12 15:55:24,355 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:55:24,358 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 15:55:24,358 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:55:24,358 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr 2023-10-12 15:55:24,358 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:55:24,358 Train: 7936 sentences 2023-10-12 15:55:24,358 (train_with_dev=False, train_with_test=False) 2023-10-12 15:55:24,358 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:55:24,358 Training Params: 2023-10-12 15:55:24,359 - learning_rate: "0.00016" 2023-10-12 15:55:24,359 - mini_batch_size: "8" 2023-10-12 15:55:24,359 - max_epochs: "10" 2023-10-12 15:55:24,359 - shuffle: "True" 2023-10-12 15:55:24,359 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:55:24,359 Plugins: 2023-10-12 15:55:24,359 - TensorboardLogger 2023-10-12 15:55:24,359 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 15:55:24,359 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:55:24,359 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 15:55:24,359 - metric: "('micro avg', 'f1-score')" 2023-10-12 15:55:24,359 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:55:24,359 Computation: 2023-10-12 15:55:24,359 - compute on device: cuda:0 2023-10-12 15:55:24,360 - embedding storage: none 2023-10-12 15:55:24,360 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:55:24,360 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-2" 2023-10-12 15:55:24,360 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:55:24,360 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:55:24,360 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 15:56:14,944 epoch 1 - iter 99/992 - loss 2.56321545 - time (sec): 50.58 - samples/sec: 325.35 - lr: 0.000016 - momentum: 0.000000 2023-10-12 15:57:05,006 epoch 1 - iter 198/992 - loss 2.49795406 - time (sec): 100.64 - samples/sec: 310.50 - lr: 0.000032 - momentum: 0.000000 2023-10-12 15:57:57,915 epoch 1 - iter 297/992 - loss 2.26413215 - time (sec): 153.55 - samples/sec: 312.49 - lr: 0.000048 - momentum: 0.000000 2023-10-12 15:58:48,550 epoch 1 - iter 396/992 - loss 2.00827529 - time (sec): 204.19 - samples/sec: 312.31 - lr: 0.000064 - momentum: 0.000000 2023-10-12 15:59:39,497 epoch 1 - iter 495/992 - loss 1.74353595 - time (sec): 255.14 - samples/sec: 314.80 - lr: 0.000080 - momentum: 0.000000 2023-10-12 16:00:30,417 epoch 1 - iter 594/992 - loss 1.52072631 - time (sec): 306.06 - samples/sec: 314.79 - lr: 0.000096 - momentum: 0.000000 2023-10-12 16:01:20,878 epoch 1 - iter 693/992 - loss 1.34248103 - time (sec): 356.52 - samples/sec: 319.63 - lr: 0.000112 - momentum: 0.000000 2023-10-12 16:02:10,880 epoch 1 - iter 792/992 - loss 1.20099065 - time (sec): 406.52 - samples/sec: 321.98 - lr: 0.000128 - momentum: 0.000000 2023-10-12 16:03:01,218 epoch 1 - iter 891/992 - loss 1.09618671 - time (sec): 456.86 - samples/sec: 321.68 - lr: 0.000144 - momentum: 0.000000 2023-10-12 16:03:51,862 epoch 1 - iter 990/992 - loss 1.00653001 - time (sec): 507.50 - samples/sec: 322.59 - lr: 0.000160 - momentum: 0.000000 2023-10-12 16:03:52,755 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:03:52,755 EPOCH 1 done: loss 1.0050 - lr: 0.000160 2023-10-12 16:04:19,057 DEV : loss 0.1908944696187973 - f1-score (micro avg) 0.282 2023-10-12 16:04:19,097 saving best model 2023-10-12 16:04:20,143 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:05:12,773 epoch 2 - iter 99/992 - loss 0.24377973 - time (sec): 52.63 - samples/sec: 315.41 - lr: 0.000158 - momentum: 0.000000 2023-10-12 16:06:03,246 epoch 2 - iter 198/992 - loss 0.21924907 - time (sec): 103.10 - samples/sec: 316.62 - lr: 0.000156 - momentum: 0.000000 2023-10-12 16:06:53,505 epoch 2 - iter 297/992 - loss 0.20957254 - time (sec): 153.36 - samples/sec: 320.31 - lr: 0.000155 - momentum: 0.000000 2023-10-12 16:07:44,205 epoch 2 - iter 396/992 - loss 0.19896949 - time (sec): 204.06 - samples/sec: 320.20 - lr: 0.000153 - momentum: 0.000000 2023-10-12 16:08:38,510 epoch 2 - iter 495/992 - loss 0.18668208 - time (sec): 258.36 - samples/sec: 319.53 - lr: 0.000151 - momentum: 0.000000 2023-10-12 16:09:29,498 epoch 2 - iter 594/992 - loss 0.17707615 - time (sec): 309.35 - samples/sec: 319.15 - lr: 0.000149 - momentum: 0.000000 2023-10-12 16:10:21,133 epoch 2 - iter 693/992 - loss 0.17186557 - time (sec): 360.99 - samples/sec: 316.76 - lr: 0.000148 - momentum: 0.000000 2023-10-12 16:11:13,508 epoch 2 - iter 792/992 - loss 0.16395654 - time (sec): 413.36 - samples/sec: 319.03 - lr: 0.000146 - momentum: 0.000000 2023-10-12 16:12:03,023 epoch 2 - iter 891/992 - loss 0.15926080 - time (sec): 462.88 - samples/sec: 321.24 - lr: 0.000144 - momentum: 0.000000 2023-10-12 16:12:52,827 epoch 2 - iter 990/992 - loss 0.15541109 - time (sec): 512.68 - samples/sec: 319.36 - lr: 0.000142 - momentum: 0.000000 2023-10-12 16:12:53,778 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:12:53,778 EPOCH 2 done: loss 0.1553 - lr: 0.000142 2023-10-12 16:13:19,717 DEV : loss 0.09468376636505127 - f1-score (micro avg) 0.714 2023-10-12 16:13:19,768 saving best model 2023-10-12 16:13:20,925 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:14:14,477 epoch 3 - iter 99/992 - loss 0.09083028 - time (sec): 53.55 - samples/sec: 326.98 - lr: 0.000140 - momentum: 0.000000 2023-10-12 16:15:04,799 epoch 3 - iter 198/992 - loss 0.09556544 - time (sec): 103.87 - samples/sec: 338.68 - lr: 0.000139 - momentum: 0.000000 2023-10-12 16:15:52,155 epoch 3 - iter 297/992 - loss 0.09527326 - time (sec): 151.23 - samples/sec: 336.24 - lr: 0.000137 - momentum: 0.000000 2023-10-12 16:16:40,451 epoch 3 - iter 396/992 - loss 0.09421116 - time (sec): 199.52 - samples/sec: 335.66 - lr: 0.000135 - momentum: 0.000000 2023-10-12 16:17:31,678 epoch 3 - iter 495/992 - loss 0.09244840 - time (sec): 250.75 - samples/sec: 331.34 - lr: 0.000133 - momentum: 0.000000 2023-10-12 16:18:23,335 epoch 3 - iter 594/992 - loss 0.09189897 - time (sec): 302.41 - samples/sec: 326.96 - lr: 0.000132 - momentum: 0.000000 2023-10-12 16:19:12,627 epoch 3 - iter 693/992 - loss 0.09093182 - time (sec): 351.70 - samples/sec: 329.26 - lr: 0.000130 - momentum: 0.000000 2023-10-12 16:20:01,184 epoch 3 - iter 792/992 - loss 0.08804390 - time (sec): 400.26 - samples/sec: 332.20 - lr: 0.000128 - momentum: 0.000000 2023-10-12 16:20:47,966 epoch 3 - iter 891/992 - loss 0.08698329 - time (sec): 447.04 - samples/sec: 332.06 - lr: 0.000126 - momentum: 0.000000 2023-10-12 16:21:34,608 epoch 3 - iter 990/992 - loss 0.08670857 - time (sec): 493.68 - samples/sec: 331.49 - lr: 0.000125 - momentum: 0.000000 2023-10-12 16:21:35,574 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:21:35,574 EPOCH 3 done: loss 0.0867 - lr: 0.000125 2023-10-12 16:21:59,714 DEV : loss 0.08734025806188583 - f1-score (micro avg) 0.7556 2023-10-12 16:21:59,757 saving best model 2023-10-12 16:22:02,851 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:22:52,065 epoch 4 - iter 99/992 - loss 0.05313606 - time (sec): 49.20 - samples/sec: 333.35 - lr: 0.000123 - momentum: 0.000000 2023-10-12 16:23:42,245 epoch 4 - iter 198/992 - loss 0.06086787 - time (sec): 99.38 - samples/sec: 332.83 - lr: 0.000121 - momentum: 0.000000 2023-10-12 16:24:40,322 epoch 4 - iter 297/992 - loss 0.06014145 - time (sec): 157.46 - samples/sec: 318.02 - lr: 0.000119 - momentum: 0.000000 2023-10-12 16:25:29,505 epoch 4 - iter 396/992 - loss 0.06054673 - time (sec): 206.64 - samples/sec: 318.20 - lr: 0.000117 - momentum: 0.000000 2023-10-12 16:26:21,583 epoch 4 - iter 495/992 - loss 0.06058721 - time (sec): 258.72 - samples/sec: 321.96 - lr: 0.000116 - momentum: 0.000000 2023-10-12 16:27:11,411 epoch 4 - iter 594/992 - loss 0.05857921 - time (sec): 308.55 - samples/sec: 324.76 - lr: 0.000114 - momentum: 0.000000 2023-10-12 16:27:59,855 epoch 4 - iter 693/992 - loss 0.05910932 - time (sec): 356.99 - samples/sec: 324.05 - lr: 0.000112 - momentum: 0.000000 2023-10-12 16:28:48,128 epoch 4 - iter 792/992 - loss 0.05823390 - time (sec): 405.27 - samples/sec: 326.81 - lr: 0.000110 - momentum: 0.000000 2023-10-12 16:29:36,891 epoch 4 - iter 891/992 - loss 0.05861596 - time (sec): 454.03 - samples/sec: 325.44 - lr: 0.000109 - momentum: 0.000000 2023-10-12 16:30:25,751 epoch 4 - iter 990/992 - loss 0.05832304 - time (sec): 502.89 - samples/sec: 325.22 - lr: 0.000107 - momentum: 0.000000 2023-10-12 16:30:26,904 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:30:26,904 EPOCH 4 done: loss 0.0582 - lr: 0.000107 2023-10-12 16:30:54,343 DEV : loss 0.10352308303117752 - f1-score (micro avg) 0.7566 2023-10-12 16:30:54,389 saving best model 2023-10-12 16:30:55,507 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:31:51,405 epoch 5 - iter 99/992 - loss 0.04078370 - time (sec): 55.90 - samples/sec: 292.78 - lr: 0.000105 - momentum: 0.000000 2023-10-12 16:32:41,627 epoch 5 - iter 198/992 - loss 0.03841056 - time (sec): 106.12 - samples/sec: 308.49 - lr: 0.000103 - momentum: 0.000000 2023-10-12 16:33:34,513 epoch 5 - iter 297/992 - loss 0.04076138 - time (sec): 159.00 - samples/sec: 304.35 - lr: 0.000101 - momentum: 0.000000 2023-10-12 16:34:30,010 epoch 5 - iter 396/992 - loss 0.04187344 - time (sec): 214.50 - samples/sec: 301.70 - lr: 0.000100 - momentum: 0.000000 2023-10-12 16:35:23,397 epoch 5 - iter 495/992 - loss 0.03957576 - time (sec): 267.89 - samples/sec: 301.63 - lr: 0.000098 - momentum: 0.000000 2023-10-12 16:36:18,871 epoch 5 - iter 594/992 - loss 0.04050811 - time (sec): 323.36 - samples/sec: 301.43 - lr: 0.000096 - momentum: 0.000000 2023-10-12 16:37:09,862 epoch 5 - iter 693/992 - loss 0.04141186 - time (sec): 374.35 - samples/sec: 305.42 - lr: 0.000094 - momentum: 0.000000 2023-10-12 16:37:59,391 epoch 5 - iter 792/992 - loss 0.04217448 - time (sec): 423.88 - samples/sec: 308.47 - lr: 0.000093 - momentum: 0.000000 2023-10-12 16:38:49,585 epoch 5 - iter 891/992 - loss 0.04209043 - time (sec): 474.08 - samples/sec: 311.45 - lr: 0.000091 - momentum: 0.000000 2023-10-12 16:39:38,421 epoch 5 - iter 990/992 - loss 0.04289663 - time (sec): 522.91 - samples/sec: 313.15 - lr: 0.000089 - momentum: 0.000000 2023-10-12 16:39:39,318 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:39:39,318 EPOCH 5 done: loss 0.0429 - lr: 0.000089 2023-10-12 16:40:05,001 DEV : loss 0.12538166344165802 - f1-score (micro avg) 0.7648 2023-10-12 16:40:05,052 saving best model 2023-10-12 16:40:07,817 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:40:59,009 epoch 6 - iter 99/992 - loss 0.03175333 - time (sec): 51.18 - samples/sec: 323.56 - lr: 0.000087 - momentum: 0.000000 2023-10-12 16:41:50,248 epoch 6 - iter 198/992 - loss 0.02609367 - time (sec): 102.42 - samples/sec: 319.62 - lr: 0.000085 - momentum: 0.000000 2023-10-12 16:42:42,250 epoch 6 - iter 297/992 - loss 0.02794266 - time (sec): 154.42 - samples/sec: 318.35 - lr: 0.000084 - momentum: 0.000000 2023-10-12 16:43:34,318 epoch 6 - iter 396/992 - loss 0.02933444 - time (sec): 206.49 - samples/sec: 317.93 - lr: 0.000082 - momentum: 0.000000 2023-10-12 16:44:24,993 epoch 6 - iter 495/992 - loss 0.02970530 - time (sec): 257.16 - samples/sec: 320.35 - lr: 0.000080 - momentum: 0.000000 2023-10-12 16:45:12,611 epoch 6 - iter 594/992 - loss 0.03096784 - time (sec): 304.78 - samples/sec: 321.96 - lr: 0.000078 - momentum: 0.000000 2023-10-12 16:46:02,533 epoch 6 - iter 693/992 - loss 0.03061474 - time (sec): 354.70 - samples/sec: 323.07 - lr: 0.000077 - momentum: 0.000000 2023-10-12 16:46:51,965 epoch 6 - iter 792/992 - loss 0.03261467 - time (sec): 404.13 - samples/sec: 323.91 - lr: 0.000075 - momentum: 0.000000 2023-10-12 16:47:42,401 epoch 6 - iter 891/992 - loss 0.03295279 - time (sec): 454.57 - samples/sec: 323.88 - lr: 0.000073 - momentum: 0.000000 2023-10-12 16:48:32,018 epoch 6 - iter 990/992 - loss 0.03343311 - time (sec): 504.19 - samples/sec: 324.49 - lr: 0.000071 - momentum: 0.000000 2023-10-12 16:48:33,122 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:48:33,123 EPOCH 6 done: loss 0.0334 - lr: 0.000071 2023-10-12 16:49:00,241 DEV : loss 0.14363163709640503 - f1-score (micro avg) 0.7525 2023-10-12 16:49:00,282 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:49:51,647 epoch 7 - iter 99/992 - loss 0.01782198 - time (sec): 51.36 - samples/sec: 317.74 - lr: 0.000069 - momentum: 0.000000 2023-10-12 16:50:40,898 epoch 7 - iter 198/992 - loss 0.02058422 - time (sec): 100.61 - samples/sec: 334.64 - lr: 0.000068 - momentum: 0.000000 2023-10-12 16:51:28,274 epoch 7 - iter 297/992 - loss 0.02091857 - time (sec): 147.99 - samples/sec: 332.55 - lr: 0.000066 - momentum: 0.000000 2023-10-12 16:52:14,627 epoch 7 - iter 396/992 - loss 0.02090248 - time (sec): 194.34 - samples/sec: 330.91 - lr: 0.000064 - momentum: 0.000000 2023-10-12 16:53:04,583 epoch 7 - iter 495/992 - loss 0.02166585 - time (sec): 244.30 - samples/sec: 331.73 - lr: 0.000062 - momentum: 0.000000 2023-10-12 16:53:55,335 epoch 7 - iter 594/992 - loss 0.02310097 - time (sec): 295.05 - samples/sec: 329.87 - lr: 0.000061 - momentum: 0.000000 2023-10-12 16:54:44,489 epoch 7 - iter 693/992 - loss 0.02339659 - time (sec): 344.20 - samples/sec: 330.10 - lr: 0.000059 - momentum: 0.000000 2023-10-12 16:55:33,197 epoch 7 - iter 792/992 - loss 0.02389100 - time (sec): 392.91 - samples/sec: 330.91 - lr: 0.000057 - momentum: 0.000000 2023-10-12 16:56:22,132 epoch 7 - iter 891/992 - loss 0.02431081 - time (sec): 441.85 - samples/sec: 333.37 - lr: 0.000055 - momentum: 0.000000 2023-10-12 16:57:11,892 epoch 7 - iter 990/992 - loss 0.02459818 - time (sec): 491.61 - samples/sec: 332.62 - lr: 0.000053 - momentum: 0.000000 2023-10-12 16:57:12,934 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:57:12,934 EPOCH 7 done: loss 0.0247 - lr: 0.000053 2023-10-12 16:57:39,706 DEV : loss 0.16530723869800568 - f1-score (micro avg) 0.7554 2023-10-12 16:57:39,753 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:58:27,524 epoch 8 - iter 99/992 - loss 0.01577233 - time (sec): 47.77 - samples/sec: 346.71 - lr: 0.000052 - momentum: 0.000000 2023-10-12 16:59:16,547 epoch 8 - iter 198/992 - loss 0.02211567 - time (sec): 96.79 - samples/sec: 348.00 - lr: 0.000050 - momentum: 0.000000 2023-10-12 17:00:04,704 epoch 8 - iter 297/992 - loss 0.01995472 - time (sec): 144.95 - samples/sec: 356.47 - lr: 0.000048 - momentum: 0.000000 2023-10-12 17:00:52,512 epoch 8 - iter 396/992 - loss 0.01872474 - time (sec): 192.76 - samples/sec: 349.88 - lr: 0.000046 - momentum: 0.000000 2023-10-12 17:01:41,060 epoch 8 - iter 495/992 - loss 0.01991923 - time (sec): 241.30 - samples/sec: 344.60 - lr: 0.000045 - momentum: 0.000000 2023-10-12 17:02:31,179 epoch 8 - iter 594/992 - loss 0.01958089 - time (sec): 291.42 - samples/sec: 338.75 - lr: 0.000043 - momentum: 0.000000 2023-10-12 17:03:25,072 epoch 8 - iter 693/992 - loss 0.02049768 - time (sec): 345.32 - samples/sec: 331.01 - lr: 0.000041 - momentum: 0.000000 2023-10-12 17:04:19,112 epoch 8 - iter 792/992 - loss 0.01974502 - time (sec): 399.36 - samples/sec: 326.61 - lr: 0.000039 - momentum: 0.000000 2023-10-12 17:05:07,947 epoch 8 - iter 891/992 - loss 0.01997094 - time (sec): 448.19 - samples/sec: 327.99 - lr: 0.000037 - momentum: 0.000000 2023-10-12 17:05:55,400 epoch 8 - iter 990/992 - loss 0.02079325 - time (sec): 495.64 - samples/sec: 329.95 - lr: 0.000036 - momentum: 0.000000 2023-10-12 17:05:56,423 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:05:56,424 EPOCH 8 done: loss 0.0208 - lr: 0.000036 2023-10-12 17:06:26,136 DEV : loss 0.1831587851047516 - f1-score (micro avg) 0.7442 2023-10-12 17:06:26,182 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:07:18,482 epoch 9 - iter 99/992 - loss 0.01539191 - time (sec): 52.30 - samples/sec: 312.29 - lr: 0.000034 - momentum: 0.000000 2023-10-12 17:08:10,089 epoch 9 - iter 198/992 - loss 0.01781130 - time (sec): 103.91 - samples/sec: 315.27 - lr: 0.000032 - momentum: 0.000000 2023-10-12 17:09:01,393 epoch 9 - iter 297/992 - loss 0.01659587 - time (sec): 155.21 - samples/sec: 319.09 - lr: 0.000030 - momentum: 0.000000 2023-10-12 17:09:49,218 epoch 9 - iter 396/992 - loss 0.01615555 - time (sec): 203.03 - samples/sec: 324.93 - lr: 0.000029 - momentum: 0.000000 2023-10-12 17:10:38,030 epoch 9 - iter 495/992 - loss 0.01445293 - time (sec): 251.85 - samples/sec: 330.94 - lr: 0.000027 - momentum: 0.000000 2023-10-12 17:11:25,245 epoch 9 - iter 594/992 - loss 0.01567842 - time (sec): 299.06 - samples/sec: 333.81 - lr: 0.000025 - momentum: 0.000000 2023-10-12 17:12:12,456 epoch 9 - iter 693/992 - loss 0.01524842 - time (sec): 346.27 - samples/sec: 335.54 - lr: 0.000023 - momentum: 0.000000 2023-10-12 17:13:00,743 epoch 9 - iter 792/992 - loss 0.01532233 - time (sec): 394.56 - samples/sec: 335.73 - lr: 0.000022 - momentum: 0.000000 2023-10-12 17:13:46,816 epoch 9 - iter 891/992 - loss 0.01587203 - time (sec): 440.63 - samples/sec: 335.75 - lr: 0.000020 - momentum: 0.000000 2023-10-12 17:14:34,753 epoch 9 - iter 990/992 - loss 0.01619767 - time (sec): 488.57 - samples/sec: 334.70 - lr: 0.000018 - momentum: 0.000000 2023-10-12 17:14:35,808 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:14:35,808 EPOCH 9 done: loss 0.0162 - lr: 0.000018 2023-10-12 17:15:00,598 DEV : loss 0.19426794350147247 - f1-score (micro avg) 0.7581 2023-10-12 17:15:00,648 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:15:49,645 epoch 10 - iter 99/992 - loss 0.01232717 - time (sec): 48.99 - samples/sec: 342.48 - lr: 0.000016 - momentum: 0.000000 2023-10-12 17:16:38,438 epoch 10 - iter 198/992 - loss 0.01131991 - time (sec): 97.79 - samples/sec: 331.36 - lr: 0.000014 - momentum: 0.000000 2023-10-12 17:17:29,591 epoch 10 - iter 297/992 - loss 0.01317737 - time (sec): 148.94 - samples/sec: 323.19 - lr: 0.000013 - momentum: 0.000000 2023-10-12 17:18:18,846 epoch 10 - iter 396/992 - loss 0.01238455 - time (sec): 198.20 - samples/sec: 326.30 - lr: 0.000011 - momentum: 0.000000 2023-10-12 17:19:12,751 epoch 10 - iter 495/992 - loss 0.01273722 - time (sec): 252.10 - samples/sec: 326.02 - lr: 0.000009 - momentum: 0.000000 2023-10-12 17:20:02,197 epoch 10 - iter 594/992 - loss 0.01267390 - time (sec): 301.55 - samples/sec: 328.00 - lr: 0.000007 - momentum: 0.000000 2023-10-12 17:20:51,313 epoch 10 - iter 693/992 - loss 0.01351590 - time (sec): 350.66 - samples/sec: 327.29 - lr: 0.000006 - momentum: 0.000000 2023-10-12 17:21:39,506 epoch 10 - iter 792/992 - loss 0.01365947 - time (sec): 398.86 - samples/sec: 328.08 - lr: 0.000004 - momentum: 0.000000 2023-10-12 17:22:28,390 epoch 10 - iter 891/992 - loss 0.01345225 - time (sec): 447.74 - samples/sec: 329.27 - lr: 0.000002 - momentum: 0.000000 2023-10-12 17:23:16,438 epoch 10 - iter 990/992 - loss 0.01388417 - time (sec): 495.79 - samples/sec: 330.26 - lr: 0.000000 - momentum: 0.000000 2023-10-12 17:23:17,412 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:23:17,413 EPOCH 10 done: loss 0.0139 - lr: 0.000000 2023-10-12 17:23:43,904 DEV : loss 0.19736182689666748 - f1-score (micro avg) 0.7561 2023-10-12 17:23:44,891 ---------------------------------------------------------------------------------------------------- 2023-10-12 17:23:44,893 Loading model from best epoch ... 2023-10-12 17:23:49,370 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 17:24:14,543 Results: - F-score (micro) 0.7481 - F-score (macro) 0.6677 - Accuracy 0.6274 By class: precision recall f1-score support LOC 0.7947 0.8275 0.8108 655 PER 0.6617 0.7892 0.7198 223 ORG 0.4724 0.4724 0.4724 127 micro avg 0.7237 0.7741 0.7481 1005 macro avg 0.6429 0.6964 0.6677 1005 weighted avg 0.7245 0.7741 0.7478 1005 2023-10-12 17:24:14,543 ----------------------------------------------------------------------------------------------------