2023-10-13 04:14:08,603 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:14:08,605 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 04:14:08,605 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:14:08,605 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr 2023-10-13 04:14:08,605 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:14:08,606 Train: 7936 sentences 2023-10-13 04:14:08,606 (train_with_dev=False, train_with_test=False) 2023-10-13 04:14:08,606 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:14:08,606 Training Params: 2023-10-13 04:14:08,606 - learning_rate: "0.00016" 2023-10-13 04:14:08,606 - mini_batch_size: "8" 2023-10-13 04:14:08,606 - max_epochs: "10" 2023-10-13 04:14:08,606 - shuffle: "True" 2023-10-13 04:14:08,606 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:14:08,606 Plugins: 2023-10-13 04:14:08,606 - TensorboardLogger 2023-10-13 04:14:08,606 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 04:14:08,606 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:14:08,606 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 04:14:08,606 - metric: "('micro avg', 'f1-score')" 2023-10-13 04:14:08,607 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:14:08,607 Computation: 2023-10-13 04:14:08,607 - compute on device: cuda:0 2023-10-13 04:14:08,607 - embedding storage: none 2023-10-13 04:14:08,607 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:14:08,607 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4" 2023-10-13 04:14:08,607 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:14:08,607 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:14:08,607 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-13 04:14:58,156 epoch 1 - iter 99/992 - loss 2.56865486 - time (sec): 49.55 - samples/sec: 329.22 - lr: 0.000016 - momentum: 0.000000 2023-10-13 04:15:48,966 epoch 1 - iter 198/992 - loss 2.46172433 - time (sec): 100.36 - samples/sec: 332.22 - lr: 0.000032 - momentum: 0.000000 2023-10-13 04:16:38,547 epoch 1 - iter 297/992 - loss 2.23758535 - time (sec): 149.94 - samples/sec: 334.59 - lr: 0.000048 - momentum: 0.000000 2023-10-13 04:17:31,347 epoch 1 - iter 396/992 - loss 1.99413491 - time (sec): 202.74 - samples/sec: 326.53 - lr: 0.000064 - momentum: 0.000000 2023-10-13 04:18:23,851 epoch 1 - iter 495/992 - loss 1.74861311 - time (sec): 255.24 - samples/sec: 323.61 - lr: 0.000080 - momentum: 0.000000 2023-10-13 04:19:13,456 epoch 1 - iter 594/992 - loss 1.53688472 - time (sec): 304.85 - samples/sec: 324.25 - lr: 0.000096 - momentum: 0.000000 2023-10-13 04:20:06,337 epoch 1 - iter 693/992 - loss 1.36474780 - time (sec): 357.73 - samples/sec: 320.74 - lr: 0.000112 - momentum: 0.000000 2023-10-13 04:20:57,051 epoch 1 - iter 792/992 - loss 1.22668635 - time (sec): 408.44 - samples/sec: 321.52 - lr: 0.000128 - momentum: 0.000000 2023-10-13 04:21:50,239 epoch 1 - iter 891/992 - loss 1.11265498 - time (sec): 461.63 - samples/sec: 320.15 - lr: 0.000144 - momentum: 0.000000 2023-10-13 04:22:41,004 epoch 1 - iter 990/992 - loss 1.02671952 - time (sec): 512.40 - samples/sec: 319.15 - lr: 0.000160 - momentum: 0.000000 2023-10-13 04:22:42,121 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:22:42,121 EPOCH 1 done: loss 1.0251 - lr: 0.000160 2023-10-13 04:23:06,824 DEV : loss 0.18082141876220703 - f1-score (micro avg) 0.2928 2023-10-13 04:23:06,864 saving best model 2023-10-13 04:23:07,860 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:23:57,355 epoch 2 - iter 99/992 - loss 0.18788274 - time (sec): 49.49 - samples/sec: 333.87 - lr: 0.000158 - momentum: 0.000000 2023-10-13 04:24:46,978 epoch 2 - iter 198/992 - loss 0.17968338 - time (sec): 99.12 - samples/sec: 331.40 - lr: 0.000156 - momentum: 0.000000 2023-10-13 04:25:38,207 epoch 2 - iter 297/992 - loss 0.17120997 - time (sec): 150.34 - samples/sec: 326.42 - lr: 0.000155 - momentum: 0.000000 2023-10-13 04:26:29,906 epoch 2 - iter 396/992 - loss 0.15979528 - time (sec): 202.04 - samples/sec: 318.73 - lr: 0.000153 - momentum: 0.000000 2023-10-13 04:27:21,456 epoch 2 - iter 495/992 - loss 0.15367467 - time (sec): 253.59 - samples/sec: 322.04 - lr: 0.000151 - momentum: 0.000000 2023-10-13 04:28:12,010 epoch 2 - iter 594/992 - loss 0.15176305 - time (sec): 304.15 - samples/sec: 322.04 - lr: 0.000149 - momentum: 0.000000 2023-10-13 04:29:05,836 epoch 2 - iter 693/992 - loss 0.14750412 - time (sec): 357.97 - samples/sec: 320.03 - lr: 0.000148 - momentum: 0.000000 2023-10-13 04:29:57,368 epoch 2 - iter 792/992 - loss 0.14466303 - time (sec): 409.51 - samples/sec: 319.44 - lr: 0.000146 - momentum: 0.000000 2023-10-13 04:30:49,745 epoch 2 - iter 891/992 - loss 0.14068498 - time (sec): 461.88 - samples/sec: 318.94 - lr: 0.000144 - momentum: 0.000000 2023-10-13 04:31:40,698 epoch 2 - iter 990/992 - loss 0.13772983 - time (sec): 512.84 - samples/sec: 319.25 - lr: 0.000142 - momentum: 0.000000 2023-10-13 04:31:41,679 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:31:41,679 EPOCH 2 done: loss 0.1377 - lr: 0.000142 2023-10-13 04:32:07,854 DEV : loss 0.08747222274541855 - f1-score (micro avg) 0.7239 2023-10-13 04:32:07,899 saving best model 2023-10-13 04:32:10,631 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:33:02,357 epoch 3 - iter 99/992 - loss 0.07819207 - time (sec): 51.72 - samples/sec: 331.55 - lr: 0.000140 - momentum: 0.000000 2023-10-13 04:33:54,269 epoch 3 - iter 198/992 - loss 0.07894572 - time (sec): 103.63 - samples/sec: 320.36 - lr: 0.000139 - momentum: 0.000000 2023-10-13 04:34:44,807 epoch 3 - iter 297/992 - loss 0.08415351 - time (sec): 154.17 - samples/sec: 320.73 - lr: 0.000137 - momentum: 0.000000 2023-10-13 04:35:36,635 epoch 3 - iter 396/992 - loss 0.07922860 - time (sec): 206.00 - samples/sec: 320.18 - lr: 0.000135 - momentum: 0.000000 2023-10-13 04:36:26,532 epoch 3 - iter 495/992 - loss 0.08091910 - time (sec): 255.90 - samples/sec: 318.24 - lr: 0.000133 - momentum: 0.000000 2023-10-13 04:37:17,629 epoch 3 - iter 594/992 - loss 0.07980487 - time (sec): 306.99 - samples/sec: 317.70 - lr: 0.000132 - momentum: 0.000000 2023-10-13 04:38:09,443 epoch 3 - iter 693/992 - loss 0.07710158 - time (sec): 358.81 - samples/sec: 317.23 - lr: 0.000130 - momentum: 0.000000 2023-10-13 04:39:01,711 epoch 3 - iter 792/992 - loss 0.07628307 - time (sec): 411.07 - samples/sec: 317.08 - lr: 0.000128 - momentum: 0.000000 2023-10-13 04:39:53,854 epoch 3 - iter 891/992 - loss 0.07624698 - time (sec): 463.22 - samples/sec: 318.58 - lr: 0.000126 - momentum: 0.000000 2023-10-13 04:40:43,741 epoch 3 - iter 990/992 - loss 0.07598069 - time (sec): 513.11 - samples/sec: 318.76 - lr: 0.000125 - momentum: 0.000000 2023-10-13 04:40:44,816 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:40:44,816 EPOCH 3 done: loss 0.0759 - lr: 0.000125 2023-10-13 04:41:10,775 DEV : loss 0.09128082543611526 - f1-score (micro avg) 0.748 2023-10-13 04:41:10,816 saving best model 2023-10-13 04:41:13,542 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:42:02,884 epoch 4 - iter 99/992 - loss 0.06228453 - time (sec): 49.34 - samples/sec: 337.88 - lr: 0.000123 - momentum: 0.000000 2023-10-13 04:42:50,682 epoch 4 - iter 198/992 - loss 0.05538262 - time (sec): 97.14 - samples/sec: 335.17 - lr: 0.000121 - momentum: 0.000000 2023-10-13 04:43:40,785 epoch 4 - iter 297/992 - loss 0.05689109 - time (sec): 147.24 - samples/sec: 343.11 - lr: 0.000119 - momentum: 0.000000 2023-10-13 04:44:30,311 epoch 4 - iter 396/992 - loss 0.05535815 - time (sec): 196.77 - samples/sec: 338.80 - lr: 0.000117 - momentum: 0.000000 2023-10-13 04:45:19,963 epoch 4 - iter 495/992 - loss 0.05540351 - time (sec): 246.42 - samples/sec: 337.69 - lr: 0.000116 - momentum: 0.000000 2023-10-13 04:46:09,565 epoch 4 - iter 594/992 - loss 0.05400214 - time (sec): 296.02 - samples/sec: 335.96 - lr: 0.000114 - momentum: 0.000000 2023-10-13 04:46:58,413 epoch 4 - iter 693/992 - loss 0.05211183 - time (sec): 344.87 - samples/sec: 335.72 - lr: 0.000112 - momentum: 0.000000 2023-10-13 04:47:46,017 epoch 4 - iter 792/992 - loss 0.05257322 - time (sec): 392.47 - samples/sec: 334.60 - lr: 0.000110 - momentum: 0.000000 2023-10-13 04:48:34,254 epoch 4 - iter 891/992 - loss 0.05154104 - time (sec): 440.71 - samples/sec: 336.01 - lr: 0.000109 - momentum: 0.000000 2023-10-13 04:49:23,732 epoch 4 - iter 990/992 - loss 0.05162959 - time (sec): 490.19 - samples/sec: 334.08 - lr: 0.000107 - momentum: 0.000000 2023-10-13 04:49:24,636 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:49:24,637 EPOCH 4 done: loss 0.0518 - lr: 0.000107 2023-10-13 04:49:49,901 DEV : loss 0.10848435014486313 - f1-score (micro avg) 0.7553 2023-10-13 04:49:49,940 saving best model 2023-10-13 04:49:52,495 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:50:40,029 epoch 5 - iter 99/992 - loss 0.03342675 - time (sec): 47.53 - samples/sec: 341.47 - lr: 0.000105 - momentum: 0.000000 2023-10-13 04:51:28,326 epoch 5 - iter 198/992 - loss 0.03413630 - time (sec): 95.83 - samples/sec: 328.51 - lr: 0.000103 - momentum: 0.000000 2023-10-13 04:52:17,369 epoch 5 - iter 297/992 - loss 0.03855701 - time (sec): 144.87 - samples/sec: 333.04 - lr: 0.000101 - momentum: 0.000000 2023-10-13 04:53:05,620 epoch 5 - iter 396/992 - loss 0.03711810 - time (sec): 193.12 - samples/sec: 334.09 - lr: 0.000100 - momentum: 0.000000 2023-10-13 04:53:55,821 epoch 5 - iter 495/992 - loss 0.03541820 - time (sec): 243.32 - samples/sec: 339.75 - lr: 0.000098 - momentum: 0.000000 2023-10-13 04:54:43,494 epoch 5 - iter 594/992 - loss 0.03625552 - time (sec): 290.99 - samples/sec: 338.13 - lr: 0.000096 - momentum: 0.000000 2023-10-13 04:55:31,701 epoch 5 - iter 693/992 - loss 0.03814903 - time (sec): 339.20 - samples/sec: 336.88 - lr: 0.000094 - momentum: 0.000000 2023-10-13 04:56:20,260 epoch 5 - iter 792/992 - loss 0.03895430 - time (sec): 387.76 - samples/sec: 334.46 - lr: 0.000093 - momentum: 0.000000 2023-10-13 04:57:10,432 epoch 5 - iter 891/992 - loss 0.03771584 - time (sec): 437.93 - samples/sec: 333.54 - lr: 0.000091 - momentum: 0.000000 2023-10-13 04:58:00,780 epoch 5 - iter 990/992 - loss 0.03802463 - time (sec): 488.28 - samples/sec: 335.13 - lr: 0.000089 - momentum: 0.000000 2023-10-13 04:58:01,809 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:58:01,809 EPOCH 5 done: loss 0.0380 - lr: 0.000089 2023-10-13 04:58:28,182 DEV : loss 0.13769647479057312 - f1-score (micro avg) 0.762 2023-10-13 04:58:28,228 saving best model 2023-10-13 04:58:30,918 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:59:18,556 epoch 6 - iter 99/992 - loss 0.02656880 - time (sec): 47.63 - samples/sec: 323.66 - lr: 0.000087 - momentum: 0.000000 2023-10-13 05:00:11,683 epoch 6 - iter 198/992 - loss 0.02980051 - time (sec): 100.76 - samples/sec: 312.78 - lr: 0.000085 - momentum: 0.000000 2023-10-13 05:01:02,123 epoch 6 - iter 297/992 - loss 0.03304588 - time (sec): 151.20 - samples/sec: 315.57 - lr: 0.000084 - momentum: 0.000000 2023-10-13 05:01:52,312 epoch 6 - iter 396/992 - loss 0.03179357 - time (sec): 201.39 - samples/sec: 320.01 - lr: 0.000082 - momentum: 0.000000 2023-10-13 05:02:43,007 epoch 6 - iter 495/992 - loss 0.03052012 - time (sec): 252.08 - samples/sec: 324.24 - lr: 0.000080 - momentum: 0.000000 2023-10-13 05:03:34,090 epoch 6 - iter 594/992 - loss 0.03071116 - time (sec): 303.17 - samples/sec: 325.20 - lr: 0.000078 - momentum: 0.000000 2023-10-13 05:04:25,137 epoch 6 - iter 693/992 - loss 0.02923643 - time (sec): 354.21 - samples/sec: 324.07 - lr: 0.000077 - momentum: 0.000000 2023-10-13 05:05:13,854 epoch 6 - iter 792/992 - loss 0.02978942 - time (sec): 402.93 - samples/sec: 323.30 - lr: 0.000075 - momentum: 0.000000 2023-10-13 05:06:04,571 epoch 6 - iter 891/992 - loss 0.02981938 - time (sec): 453.65 - samples/sec: 324.88 - lr: 0.000073 - momentum: 0.000000 2023-10-13 05:06:54,557 epoch 6 - iter 990/992 - loss 0.03049851 - time (sec): 503.63 - samples/sec: 324.83 - lr: 0.000071 - momentum: 0.000000 2023-10-13 05:06:55,656 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:06:55,656 EPOCH 6 done: loss 0.0304 - lr: 0.000071 2023-10-13 05:07:20,858 DEV : loss 0.14694465696811676 - f1-score (micro avg) 0.7653 2023-10-13 05:07:20,897 saving best model 2023-10-13 05:07:23,677 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:08:11,423 epoch 7 - iter 99/992 - loss 0.01687958 - time (sec): 47.74 - samples/sec: 327.70 - lr: 0.000069 - momentum: 0.000000 2023-10-13 05:09:01,475 epoch 7 - iter 198/992 - loss 0.02285926 - time (sec): 97.79 - samples/sec: 324.27 - lr: 0.000068 - momentum: 0.000000 2023-10-13 05:09:53,728 epoch 7 - iter 297/992 - loss 0.02270010 - time (sec): 150.05 - samples/sec: 321.95 - lr: 0.000066 - momentum: 0.000000 2023-10-13 05:10:44,367 epoch 7 - iter 396/992 - loss 0.02204410 - time (sec): 200.69 - samples/sec: 320.36 - lr: 0.000064 - momentum: 0.000000 2023-10-13 05:11:34,979 epoch 7 - iter 495/992 - loss 0.02154389 - time (sec): 251.30 - samples/sec: 320.44 - lr: 0.000062 - momentum: 0.000000 2023-10-13 05:12:25,964 epoch 7 - iter 594/992 - loss 0.02205100 - time (sec): 302.28 - samples/sec: 321.31 - lr: 0.000061 - momentum: 0.000000 2023-10-13 05:13:16,543 epoch 7 - iter 693/992 - loss 0.02259172 - time (sec): 352.86 - samples/sec: 323.01 - lr: 0.000059 - momentum: 0.000000 2023-10-13 05:14:06,889 epoch 7 - iter 792/992 - loss 0.02154014 - time (sec): 403.21 - samples/sec: 322.82 - lr: 0.000057 - momentum: 0.000000 2023-10-13 05:14:57,138 epoch 7 - iter 891/992 - loss 0.02206872 - time (sec): 453.46 - samples/sec: 322.11 - lr: 0.000055 - momentum: 0.000000 2023-10-13 05:15:48,478 epoch 7 - iter 990/992 - loss 0.02278503 - time (sec): 504.80 - samples/sec: 324.26 - lr: 0.000053 - momentum: 0.000000 2023-10-13 05:15:49,462 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:15:49,463 EPOCH 7 done: loss 0.0227 - lr: 0.000053 2023-10-13 05:16:14,662 DEV : loss 0.16488906741142273 - f1-score (micro avg) 0.7615 2023-10-13 05:16:14,706 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:17:03,564 epoch 8 - iter 99/992 - loss 0.01278100 - time (sec): 48.86 - samples/sec: 325.49 - lr: 0.000052 - momentum: 0.000000 2023-10-13 05:17:52,849 epoch 8 - iter 198/992 - loss 0.01520807 - time (sec): 98.14 - samples/sec: 326.35 - lr: 0.000050 - momentum: 0.000000 2023-10-13 05:18:40,529 epoch 8 - iter 297/992 - loss 0.01548855 - time (sec): 145.82 - samples/sec: 325.67 - lr: 0.000048 - momentum: 0.000000 2023-10-13 05:19:28,623 epoch 8 - iter 396/992 - loss 0.01657565 - time (sec): 193.91 - samples/sec: 329.78 - lr: 0.000046 - momentum: 0.000000 2023-10-13 05:20:16,896 epoch 8 - iter 495/992 - loss 0.01599928 - time (sec): 242.19 - samples/sec: 333.06 - lr: 0.000045 - momentum: 0.000000 2023-10-13 05:21:05,274 epoch 8 - iter 594/992 - loss 0.01625231 - time (sec): 290.57 - samples/sec: 335.43 - lr: 0.000043 - momentum: 0.000000 2023-10-13 05:21:52,103 epoch 8 - iter 693/992 - loss 0.01587633 - time (sec): 337.39 - samples/sec: 336.05 - lr: 0.000041 - momentum: 0.000000 2023-10-13 05:22:38,673 epoch 8 - iter 792/992 - loss 0.01633644 - time (sec): 383.96 - samples/sec: 338.92 - lr: 0.000039 - momentum: 0.000000 2023-10-13 05:23:29,329 epoch 8 - iter 891/992 - loss 0.01708680 - time (sec): 434.62 - samples/sec: 339.54 - lr: 0.000037 - momentum: 0.000000 2023-10-13 05:24:19,926 epoch 8 - iter 990/992 - loss 0.01845692 - time (sec): 485.22 - samples/sec: 337.18 - lr: 0.000036 - momentum: 0.000000 2023-10-13 05:24:20,971 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:24:20,971 EPOCH 8 done: loss 0.0185 - lr: 0.000036 2023-10-13 05:24:48,709 DEV : loss 0.19012384116649628 - f1-score (micro avg) 0.7536 2023-10-13 05:24:48,766 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:25:43,409 epoch 9 - iter 99/992 - loss 0.00849732 - time (sec): 54.64 - samples/sec: 305.07 - lr: 0.000034 - momentum: 0.000000 2023-10-13 05:26:36,879 epoch 9 - iter 198/992 - loss 0.01199921 - time (sec): 108.11 - samples/sec: 314.39 - lr: 0.000032 - momentum: 0.000000 2023-10-13 05:27:27,759 epoch 9 - iter 297/992 - loss 0.01302819 - time (sec): 158.99 - samples/sec: 315.69 - lr: 0.000030 - momentum: 0.000000 2023-10-13 05:28:18,351 epoch 9 - iter 396/992 - loss 0.01357798 - time (sec): 209.58 - samples/sec: 316.71 - lr: 0.000029 - momentum: 0.000000 2023-10-13 05:29:08,611 epoch 9 - iter 495/992 - loss 0.01316842 - time (sec): 259.84 - samples/sec: 317.98 - lr: 0.000027 - momentum: 0.000000 2023-10-13 05:29:57,979 epoch 9 - iter 594/992 - loss 0.01356466 - time (sec): 309.21 - samples/sec: 315.00 - lr: 0.000025 - momentum: 0.000000 2023-10-13 05:30:46,652 epoch 9 - iter 693/992 - loss 0.01357842 - time (sec): 357.88 - samples/sec: 316.23 - lr: 0.000023 - momentum: 0.000000 2023-10-13 05:31:36,085 epoch 9 - iter 792/992 - loss 0.01294268 - time (sec): 407.32 - samples/sec: 318.35 - lr: 0.000022 - momentum: 0.000000 2023-10-13 05:32:25,535 epoch 9 - iter 891/992 - loss 0.01346019 - time (sec): 456.77 - samples/sec: 320.71 - lr: 0.000020 - momentum: 0.000000 2023-10-13 05:33:13,846 epoch 9 - iter 990/992 - loss 0.01302599 - time (sec): 505.08 - samples/sec: 324.10 - lr: 0.000018 - momentum: 0.000000 2023-10-13 05:33:14,774 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:33:14,774 EPOCH 9 done: loss 0.0131 - lr: 0.000018 2023-10-13 05:33:40,987 DEV : loss 0.20783358812332153 - f1-score (micro avg) 0.7588 2023-10-13 05:33:41,029 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:34:30,096 epoch 10 - iter 99/992 - loss 0.00987424 - time (sec): 49.06 - samples/sec: 336.37 - lr: 0.000016 - momentum: 0.000000 2023-10-13 05:35:18,799 epoch 10 - iter 198/992 - loss 0.00848148 - time (sec): 97.77 - samples/sec: 337.62 - lr: 0.000014 - momentum: 0.000000 2023-10-13 05:36:08,050 epoch 10 - iter 297/992 - loss 0.01140717 - time (sec): 147.02 - samples/sec: 335.55 - lr: 0.000013 - momentum: 0.000000 2023-10-13 05:36:59,124 epoch 10 - iter 396/992 - loss 0.01094629 - time (sec): 198.09 - samples/sec: 335.31 - lr: 0.000011 - momentum: 0.000000 2023-10-13 05:37:48,683 epoch 10 - iter 495/992 - loss 0.01077877 - time (sec): 247.65 - samples/sec: 332.77 - lr: 0.000009 - momentum: 0.000000 2023-10-13 05:38:38,278 epoch 10 - iter 594/992 - loss 0.01207366 - time (sec): 297.25 - samples/sec: 331.25 - lr: 0.000007 - momentum: 0.000000 2023-10-13 05:39:28,713 epoch 10 - iter 693/992 - loss 0.01123905 - time (sec): 347.68 - samples/sec: 332.87 - lr: 0.000006 - momentum: 0.000000 2023-10-13 05:40:19,034 epoch 10 - iter 792/992 - loss 0.01091248 - time (sec): 398.00 - samples/sec: 332.68 - lr: 0.000004 - momentum: 0.000000 2023-10-13 05:41:08,246 epoch 10 - iter 891/992 - loss 0.01117674 - time (sec): 447.21 - samples/sec: 330.47 - lr: 0.000002 - momentum: 0.000000 2023-10-13 05:41:57,225 epoch 10 - iter 990/992 - loss 0.01063125 - time (sec): 496.19 - samples/sec: 329.74 - lr: 0.000000 - momentum: 0.000000 2023-10-13 05:41:58,298 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:41:58,299 EPOCH 10 done: loss 0.0107 - lr: 0.000000 2023-10-13 05:42:23,348 DEV : loss 0.20967479050159454 - f1-score (micro avg) 0.7581 2023-10-13 05:42:24,324 ---------------------------------------------------------------------------------------------------- 2023-10-13 05:42:24,326 Loading model from best epoch ... 2023-10-13 05:42:29,849 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-13 05:42:52,953 Results: - F-score (micro) 0.7523 - F-score (macro) 0.664 - Accuracy 0.629 By class: precision recall f1-score support LOC 0.7945 0.8382 0.8158 655 PER 0.6570 0.8161 0.7280 223 ORG 0.4737 0.4252 0.4481 127 micro avg 0.7255 0.7811 0.7523 1005 macro avg 0.6417 0.6932 0.6640 1005 weighted avg 0.7235 0.7811 0.7498 1005 2023-10-13 05:42:52,953 ----------------------------------------------------------------------------------------------------