2023-10-12 14:23:06,472 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:23:06,474 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 14:23:06,474 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:23:06,475 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr 2023-10-12 14:23:06,475 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:23:06,475 Train: 7936 sentences 2023-10-12 14:23:06,475 (train_with_dev=False, train_with_test=False) 2023-10-12 14:23:06,475 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:23:06,475 Training Params: 2023-10-12 14:23:06,475 - learning_rate: "0.00015" 2023-10-12 14:23:06,475 - mini_batch_size: "8" 2023-10-12 14:23:06,475 - max_epochs: "10" 2023-10-12 14:23:06,475 - shuffle: "True" 2023-10-12 14:23:06,475 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:23:06,475 Plugins: 2023-10-12 14:23:06,476 - TensorboardLogger 2023-10-12 14:23:06,476 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 14:23:06,476 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:23:06,476 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 14:23:06,476 - metric: "('micro avg', 'f1-score')" 2023-10-12 14:23:06,476 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:23:06,476 Computation: 2023-10-12 14:23:06,476 - compute on device: cuda:0 2023-10-12 14:23:06,476 - embedding storage: none 2023-10-12 14:23:06,476 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:23:06,476 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2" 2023-10-12 14:23:06,476 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:23:06,476 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:23:06,477 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 14:23:58,839 epoch 1 - iter 99/992 - loss 2.56405093 - time (sec): 52.36 - samples/sec: 314.30 - lr: 0.000015 - momentum: 0.000000 2023-10-12 14:24:52,509 epoch 1 - iter 198/992 - loss 2.50730143 - time (sec): 106.03 - samples/sec: 294.73 - lr: 0.000030 - momentum: 0.000000 2023-10-12 14:25:42,625 epoch 1 - iter 297/992 - loss 2.28636494 - time (sec): 156.15 - samples/sec: 307.30 - lr: 0.000045 - momentum: 0.000000 2023-10-12 14:26:32,830 epoch 1 - iter 396/992 - loss 2.03933405 - time (sec): 206.35 - samples/sec: 309.04 - lr: 0.000060 - momentum: 0.000000 2023-10-12 14:27:23,165 epoch 1 - iter 495/992 - loss 1.78071138 - time (sec): 256.69 - samples/sec: 312.90 - lr: 0.000075 - momentum: 0.000000 2023-10-12 14:28:14,409 epoch 1 - iter 594/992 - loss 1.55630172 - time (sec): 307.93 - samples/sec: 312.87 - lr: 0.000090 - momentum: 0.000000 2023-10-12 14:29:08,992 epoch 1 - iter 693/992 - loss 1.37772125 - time (sec): 362.51 - samples/sec: 314.34 - lr: 0.000105 - momentum: 0.000000 2023-10-12 14:30:00,550 epoch 1 - iter 792/992 - loss 1.23276797 - time (sec): 414.07 - samples/sec: 316.11 - lr: 0.000120 - momentum: 0.000000 2023-10-12 14:30:50,923 epoch 1 - iter 891/992 - loss 1.12489881 - time (sec): 464.44 - samples/sec: 316.43 - lr: 0.000135 - momentum: 0.000000 2023-10-12 14:31:43,425 epoch 1 - iter 990/992 - loss 1.03255365 - time (sec): 516.95 - samples/sec: 316.69 - lr: 0.000150 - momentum: 0.000000 2023-10-12 14:31:44,449 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:31:44,450 EPOCH 1 done: loss 1.0310 - lr: 0.000150 2023-10-12 14:32:10,627 DEV : loss 0.19246874749660492 - f1-score (micro avg) 0.2781 2023-10-12 14:32:10,679 saving best model 2023-10-12 14:32:11,770 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:33:04,449 epoch 2 - iter 99/992 - loss 0.24533994 - time (sec): 52.68 - samples/sec: 315.11 - lr: 0.000148 - momentum: 0.000000 2023-10-12 14:33:55,648 epoch 2 - iter 198/992 - loss 0.22080732 - time (sec): 103.88 - samples/sec: 314.26 - lr: 0.000147 - momentum: 0.000000 2023-10-12 14:34:45,992 epoch 2 - iter 297/992 - loss 0.21179962 - time (sec): 154.22 - samples/sec: 318.53 - lr: 0.000145 - momentum: 0.000000 2023-10-12 14:35:37,964 epoch 2 - iter 396/992 - loss 0.20145417 - time (sec): 206.19 - samples/sec: 316.89 - lr: 0.000143 - momentum: 0.000000 2023-10-12 14:36:30,830 epoch 2 - iter 495/992 - loss 0.18919186 - time (sec): 259.06 - samples/sec: 318.67 - lr: 0.000142 - momentum: 0.000000 2023-10-12 14:37:24,926 epoch 2 - iter 594/992 - loss 0.17943115 - time (sec): 313.15 - samples/sec: 315.28 - lr: 0.000140 - momentum: 0.000000 2023-10-12 14:38:15,255 epoch 2 - iter 693/992 - loss 0.17373392 - time (sec): 363.48 - samples/sec: 314.59 - lr: 0.000138 - momentum: 0.000000 2023-10-12 14:39:05,080 epoch 2 - iter 792/992 - loss 0.16569248 - time (sec): 413.31 - samples/sec: 319.07 - lr: 0.000137 - momentum: 0.000000 2023-10-12 14:39:56,229 epoch 2 - iter 891/992 - loss 0.16103171 - time (sec): 464.46 - samples/sec: 320.15 - lr: 0.000135 - momentum: 0.000000 2023-10-12 14:40:45,143 epoch 2 - iter 990/992 - loss 0.15713380 - time (sec): 513.37 - samples/sec: 318.94 - lr: 0.000133 - momentum: 0.000000 2023-10-12 14:40:46,074 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:40:46,074 EPOCH 2 done: loss 0.1571 - lr: 0.000133 2023-10-12 14:41:12,040 DEV : loss 0.0952228307723999 - f1-score (micro avg) 0.7205 2023-10-12 14:41:12,083 saving best model 2023-10-12 14:41:15,387 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:42:08,824 epoch 3 - iter 99/992 - loss 0.09328576 - time (sec): 53.43 - samples/sec: 327.69 - lr: 0.000132 - momentum: 0.000000 2023-10-12 14:43:00,456 epoch 3 - iter 198/992 - loss 0.09734946 - time (sec): 105.06 - samples/sec: 334.83 - lr: 0.000130 - momentum: 0.000000 2023-10-12 14:43:50,921 epoch 3 - iter 297/992 - loss 0.09691089 - time (sec): 155.53 - samples/sec: 326.94 - lr: 0.000128 - momentum: 0.000000 2023-10-12 14:44:41,687 epoch 3 - iter 396/992 - loss 0.09478854 - time (sec): 206.29 - samples/sec: 324.64 - lr: 0.000127 - momentum: 0.000000 2023-10-12 14:45:32,602 epoch 3 - iter 495/992 - loss 0.09276414 - time (sec): 257.21 - samples/sec: 323.02 - lr: 0.000125 - momentum: 0.000000 2023-10-12 14:46:22,342 epoch 3 - iter 594/992 - loss 0.09202694 - time (sec): 306.95 - samples/sec: 322.12 - lr: 0.000123 - momentum: 0.000000 2023-10-12 14:47:16,727 epoch 3 - iter 693/992 - loss 0.09075095 - time (sec): 361.34 - samples/sec: 320.48 - lr: 0.000122 - momentum: 0.000000 2023-10-12 14:48:09,526 epoch 3 - iter 792/992 - loss 0.08787323 - time (sec): 414.13 - samples/sec: 321.07 - lr: 0.000120 - momentum: 0.000000 2023-10-12 14:48:59,976 epoch 3 - iter 891/992 - loss 0.08683027 - time (sec): 464.58 - samples/sec: 319.52 - lr: 0.000118 - momentum: 0.000000 2023-10-12 14:49:50,156 epoch 3 - iter 990/992 - loss 0.08650476 - time (sec): 514.76 - samples/sec: 317.91 - lr: 0.000117 - momentum: 0.000000 2023-10-12 14:49:51,134 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:49:51,134 EPOCH 3 done: loss 0.0865 - lr: 0.000117 2023-10-12 14:50:17,516 DEV : loss 0.08749563992023468 - f1-score (micro avg) 0.7543 2023-10-12 14:50:17,567 saving best model 2023-10-12 14:50:20,249 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:51:11,890 epoch 4 - iter 99/992 - loss 0.05261384 - time (sec): 51.64 - samples/sec: 317.64 - lr: 0.000115 - momentum: 0.000000 2023-10-12 14:52:05,395 epoch 4 - iter 198/992 - loss 0.06054848 - time (sec): 105.14 - samples/sec: 314.61 - lr: 0.000113 - momentum: 0.000000 2023-10-12 14:52:57,640 epoch 4 - iter 297/992 - loss 0.05924426 - time (sec): 157.39 - samples/sec: 318.17 - lr: 0.000112 - momentum: 0.000000 2023-10-12 14:53:49,815 epoch 4 - iter 396/992 - loss 0.06057469 - time (sec): 209.56 - samples/sec: 313.76 - lr: 0.000110 - momentum: 0.000000 2023-10-12 14:54:44,198 epoch 4 - iter 495/992 - loss 0.06133964 - time (sec): 263.94 - samples/sec: 315.59 - lr: 0.000108 - momentum: 0.000000 2023-10-12 14:55:35,564 epoch 4 - iter 594/992 - loss 0.05937963 - time (sec): 315.31 - samples/sec: 317.80 - lr: 0.000107 - momentum: 0.000000 2023-10-12 14:56:26,130 epoch 4 - iter 693/992 - loss 0.05960603 - time (sec): 365.88 - samples/sec: 316.18 - lr: 0.000105 - momentum: 0.000000 2023-10-12 14:57:16,475 epoch 4 - iter 792/992 - loss 0.05898238 - time (sec): 416.22 - samples/sec: 318.21 - lr: 0.000103 - momentum: 0.000000 2023-10-12 14:58:06,362 epoch 4 - iter 891/992 - loss 0.05953463 - time (sec): 466.11 - samples/sec: 317.00 - lr: 0.000102 - momentum: 0.000000 2023-10-12 14:58:57,040 epoch 4 - iter 990/992 - loss 0.05885008 - time (sec): 516.79 - samples/sec: 316.47 - lr: 0.000100 - momentum: 0.000000 2023-10-12 14:58:58,144 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:58:58,145 EPOCH 4 done: loss 0.0587 - lr: 0.000100 2023-10-12 14:59:25,025 DEV : loss 0.10407452285289764 - f1-score (micro avg) 0.7654 2023-10-12 14:59:25,073 saving best model 2023-10-12 14:59:27,827 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:00:19,545 epoch 5 - iter 99/992 - loss 0.03889567 - time (sec): 51.71 - samples/sec: 316.45 - lr: 0.000098 - momentum: 0.000000 2023-10-12 15:01:11,397 epoch 5 - iter 198/992 - loss 0.03662648 - time (sec): 103.57 - samples/sec: 316.09 - lr: 0.000097 - momentum: 0.000000 2023-10-12 15:02:04,971 epoch 5 - iter 297/992 - loss 0.03960967 - time (sec): 157.14 - samples/sec: 307.96 - lr: 0.000095 - momentum: 0.000000 2023-10-12 15:02:56,683 epoch 5 - iter 396/992 - loss 0.04167317 - time (sec): 208.85 - samples/sec: 309.86 - lr: 0.000093 - momentum: 0.000000 2023-10-12 15:03:47,080 epoch 5 - iter 495/992 - loss 0.03965957 - time (sec): 259.25 - samples/sec: 311.68 - lr: 0.000092 - momentum: 0.000000 2023-10-12 15:04:37,206 epoch 5 - iter 594/992 - loss 0.04015022 - time (sec): 309.37 - samples/sec: 315.05 - lr: 0.000090 - momentum: 0.000000 2023-10-12 15:05:29,421 epoch 5 - iter 693/992 - loss 0.04115793 - time (sec): 361.59 - samples/sec: 316.20 - lr: 0.000088 - momentum: 0.000000 2023-10-12 15:06:23,442 epoch 5 - iter 792/992 - loss 0.04198768 - time (sec): 415.61 - samples/sec: 314.61 - lr: 0.000087 - momentum: 0.000000 2023-10-12 15:07:16,479 epoch 5 - iter 891/992 - loss 0.04165477 - time (sec): 468.65 - samples/sec: 315.06 - lr: 0.000085 - momentum: 0.000000 2023-10-12 15:08:08,791 epoch 5 - iter 990/992 - loss 0.04262827 - time (sec): 520.96 - samples/sec: 314.32 - lr: 0.000083 - momentum: 0.000000 2023-10-12 15:08:09,812 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:08:09,812 EPOCH 5 done: loss 0.0426 - lr: 0.000083 2023-10-12 15:08:36,976 DEV : loss 0.12188898026943207 - f1-score (micro avg) 0.7521 2023-10-12 15:08:37,018 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:09:29,469 epoch 6 - iter 99/992 - loss 0.03537558 - time (sec): 52.45 - samples/sec: 315.72 - lr: 0.000082 - momentum: 0.000000 2023-10-12 15:10:19,535 epoch 6 - iter 198/992 - loss 0.03029097 - time (sec): 102.51 - samples/sec: 319.31 - lr: 0.000080 - momentum: 0.000000 2023-10-12 15:11:10,109 epoch 6 - iter 297/992 - loss 0.03030686 - time (sec): 153.09 - samples/sec: 321.12 - lr: 0.000078 - momentum: 0.000000 2023-10-12 15:12:02,565 epoch 6 - iter 396/992 - loss 0.03037817 - time (sec): 205.54 - samples/sec: 319.39 - lr: 0.000077 - momentum: 0.000000 2023-10-12 15:12:55,302 epoch 6 - iter 495/992 - loss 0.03045274 - time (sec): 258.28 - samples/sec: 318.96 - lr: 0.000075 - momentum: 0.000000 2023-10-12 15:13:47,002 epoch 6 - iter 594/992 - loss 0.03066209 - time (sec): 309.98 - samples/sec: 316.55 - lr: 0.000073 - momentum: 0.000000 2023-10-12 15:14:40,417 epoch 6 - iter 693/992 - loss 0.03071125 - time (sec): 363.40 - samples/sec: 315.34 - lr: 0.000072 - momentum: 0.000000 2023-10-12 15:15:33,010 epoch 6 - iter 792/992 - loss 0.03242123 - time (sec): 415.99 - samples/sec: 314.67 - lr: 0.000070 - momentum: 0.000000 2023-10-12 15:16:24,343 epoch 6 - iter 891/992 - loss 0.03309232 - time (sec): 467.32 - samples/sec: 315.05 - lr: 0.000068 - momentum: 0.000000 2023-10-12 15:17:16,780 epoch 6 - iter 990/992 - loss 0.03304484 - time (sec): 519.76 - samples/sec: 314.77 - lr: 0.000067 - momentum: 0.000000 2023-10-12 15:17:17,894 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:17:17,894 EPOCH 6 done: loss 0.0330 - lr: 0.000067 2023-10-12 15:17:45,916 DEV : loss 0.14993008971214294 - f1-score (micro avg) 0.7522 2023-10-12 15:17:45,966 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:18:37,095 epoch 7 - iter 99/992 - loss 0.02030191 - time (sec): 51.13 - samples/sec: 319.21 - lr: 0.000065 - momentum: 0.000000 2023-10-12 15:19:28,370 epoch 7 - iter 198/992 - loss 0.02119849 - time (sec): 102.40 - samples/sec: 328.80 - lr: 0.000063 - momentum: 0.000000 2023-10-12 15:20:16,280 epoch 7 - iter 297/992 - loss 0.02201885 - time (sec): 150.31 - samples/sec: 327.41 - lr: 0.000062 - momentum: 0.000000 2023-10-12 15:21:05,894 epoch 7 - iter 396/992 - loss 0.02176445 - time (sec): 199.93 - samples/sec: 321.67 - lr: 0.000060 - momentum: 0.000000 2023-10-12 15:21:55,409 epoch 7 - iter 495/992 - loss 0.02256629 - time (sec): 249.44 - samples/sec: 324.90 - lr: 0.000058 - momentum: 0.000000 2023-10-12 15:22:45,213 epoch 7 - iter 594/992 - loss 0.02351557 - time (sec): 299.24 - samples/sec: 325.25 - lr: 0.000057 - momentum: 0.000000 2023-10-12 15:23:36,665 epoch 7 - iter 693/992 - loss 0.02415674 - time (sec): 350.70 - samples/sec: 323.99 - lr: 0.000055 - momentum: 0.000000 2023-10-12 15:24:26,633 epoch 7 - iter 792/992 - loss 0.02429200 - time (sec): 400.66 - samples/sec: 324.51 - lr: 0.000053 - momentum: 0.000000 2023-10-12 15:25:19,041 epoch 7 - iter 891/992 - loss 0.02486713 - time (sec): 453.07 - samples/sec: 325.11 - lr: 0.000052 - momentum: 0.000000 2023-10-12 15:26:12,734 epoch 7 - iter 990/992 - loss 0.02500123 - time (sec): 506.77 - samples/sec: 322.67 - lr: 0.000050 - momentum: 0.000000 2023-10-12 15:26:13,792 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:26:13,793 EPOCH 7 done: loss 0.0251 - lr: 0.000050 2023-10-12 15:26:41,404 DEV : loss 0.1722940057516098 - f1-score (micro avg) 0.752 2023-10-12 15:26:41,448 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:27:31,951 epoch 8 - iter 99/992 - loss 0.01555518 - time (sec): 50.50 - samples/sec: 327.95 - lr: 0.000048 - momentum: 0.000000 2023-10-12 15:28:26,288 epoch 8 - iter 198/992 - loss 0.02216317 - time (sec): 104.84 - samples/sec: 321.29 - lr: 0.000047 - momentum: 0.000000 2023-10-12 15:29:18,981 epoch 8 - iter 297/992 - loss 0.02003190 - time (sec): 157.53 - samples/sec: 328.00 - lr: 0.000045 - momentum: 0.000000 2023-10-12 15:30:10,588 epoch 8 - iter 396/992 - loss 0.01909816 - time (sec): 209.14 - samples/sec: 322.47 - lr: 0.000043 - momentum: 0.000000 2023-10-12 15:31:03,797 epoch 8 - iter 495/992 - loss 0.02034891 - time (sec): 262.35 - samples/sec: 316.96 - lr: 0.000042 - momentum: 0.000000 2023-10-12 15:31:57,610 epoch 8 - iter 594/992 - loss 0.02013990 - time (sec): 316.16 - samples/sec: 312.25 - lr: 0.000040 - momentum: 0.000000 2023-10-12 15:32:47,508 epoch 8 - iter 693/992 - loss 0.02122882 - time (sec): 366.06 - samples/sec: 312.26 - lr: 0.000038 - momentum: 0.000000 2023-10-12 15:33:40,380 epoch 8 - iter 792/992 - loss 0.02006401 - time (sec): 418.93 - samples/sec: 311.35 - lr: 0.000037 - momentum: 0.000000 2023-10-12 15:34:33,647 epoch 8 - iter 891/992 - loss 0.02041260 - time (sec): 472.20 - samples/sec: 311.31 - lr: 0.000035 - momentum: 0.000000 2023-10-12 15:35:27,453 epoch 8 - iter 990/992 - loss 0.02100296 - time (sec): 526.00 - samples/sec: 310.91 - lr: 0.000033 - momentum: 0.000000 2023-10-12 15:35:28,539 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:35:28,539 EPOCH 8 done: loss 0.0210 - lr: 0.000033 2023-10-12 15:35:56,007 DEV : loss 0.17951937019824982 - f1-score (micro avg) 0.7559 2023-10-12 15:35:56,059 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:36:48,755 epoch 9 - iter 99/992 - loss 0.01547043 - time (sec): 52.69 - samples/sec: 309.94 - lr: 0.000032 - momentum: 0.000000 2023-10-12 15:37:40,250 epoch 9 - iter 198/992 - loss 0.01685308 - time (sec): 104.19 - samples/sec: 314.41 - lr: 0.000030 - momentum: 0.000000 2023-10-12 15:38:30,587 epoch 9 - iter 297/992 - loss 0.01642764 - time (sec): 154.53 - samples/sec: 320.50 - lr: 0.000028 - momentum: 0.000000 2023-10-12 15:39:21,030 epoch 9 - iter 396/992 - loss 0.01591446 - time (sec): 204.97 - samples/sec: 321.86 - lr: 0.000027 - momentum: 0.000000 2023-10-12 15:40:12,894 epoch 9 - iter 495/992 - loss 0.01409155 - time (sec): 256.83 - samples/sec: 324.51 - lr: 0.000025 - momentum: 0.000000 2023-10-12 15:41:03,867 epoch 9 - iter 594/992 - loss 0.01540055 - time (sec): 307.81 - samples/sec: 324.32 - lr: 0.000023 - momentum: 0.000000 2023-10-12 15:42:00,077 epoch 9 - iter 693/992 - loss 0.01495291 - time (sec): 364.02 - samples/sec: 319.18 - lr: 0.000022 - momentum: 0.000000 2023-10-12 15:42:51,237 epoch 9 - iter 792/992 - loss 0.01449754 - time (sec): 415.18 - samples/sec: 319.06 - lr: 0.000020 - momentum: 0.000000 2023-10-12 15:43:43,244 epoch 9 - iter 891/992 - loss 0.01514467 - time (sec): 467.18 - samples/sec: 316.67 - lr: 0.000018 - momentum: 0.000000 2023-10-12 15:44:35,473 epoch 9 - iter 990/992 - loss 0.01574246 - time (sec): 519.41 - samples/sec: 314.83 - lr: 0.000017 - momentum: 0.000000 2023-10-12 15:44:36,623 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:44:36,624 EPOCH 9 done: loss 0.0157 - lr: 0.000017 2023-10-12 15:45:04,356 DEV : loss 0.19226497411727905 - f1-score (micro avg) 0.7543 2023-10-12 15:45:04,400 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:45:55,867 epoch 10 - iter 99/992 - loss 0.01391727 - time (sec): 51.46 - samples/sec: 326.05 - lr: 0.000015 - momentum: 0.000000 2023-10-12 15:46:47,501 epoch 10 - iter 198/992 - loss 0.01230319 - time (sec): 103.10 - samples/sec: 314.29 - lr: 0.000013 - momentum: 0.000000 2023-10-12 15:47:40,050 epoch 10 - iter 297/992 - loss 0.01342322 - time (sec): 155.65 - samples/sec: 309.26 - lr: 0.000012 - momentum: 0.000000 2023-10-12 15:48:32,916 epoch 10 - iter 396/992 - loss 0.01245713 - time (sec): 208.51 - samples/sec: 310.16 - lr: 0.000010 - momentum: 0.000000 2023-10-12 15:49:25,069 epoch 10 - iter 495/992 - loss 0.01322784 - time (sec): 260.67 - samples/sec: 315.31 - lr: 0.000008 - momentum: 0.000000 2023-10-12 15:50:14,857 epoch 10 - iter 594/992 - loss 0.01328043 - time (sec): 310.46 - samples/sec: 318.59 - lr: 0.000007 - momentum: 0.000000 2023-10-12 15:51:05,997 epoch 10 - iter 693/992 - loss 0.01377376 - time (sec): 361.60 - samples/sec: 317.40 - lr: 0.000005 - momentum: 0.000000 2023-10-12 15:51:58,248 epoch 10 - iter 792/992 - loss 0.01349323 - time (sec): 413.85 - samples/sec: 316.19 - lr: 0.000004 - momentum: 0.000000 2023-10-12 15:52:49,151 epoch 10 - iter 891/992 - loss 0.01312375 - time (sec): 464.75 - samples/sec: 317.22 - lr: 0.000002 - momentum: 0.000000 2023-10-12 15:53:40,355 epoch 10 - iter 990/992 - loss 0.01366969 - time (sec): 515.95 - samples/sec: 317.35 - lr: 0.000000 - momentum: 0.000000 2023-10-12 15:53:41,377 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:53:41,377 EPOCH 10 done: loss 0.0137 - lr: 0.000000 2023-10-12 15:54:08,192 DEV : loss 0.1952781081199646 - f1-score (micro avg) 0.7563 2023-10-12 15:54:09,311 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:54:09,313 Loading model from best epoch ... 2023-10-12 15:54:14,157 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 15:54:39,374 Results: - F-score (micro) 0.7583 - F-score (macro) 0.6798 - Accuracy 0.6354 By class: precision recall f1-score support LOC 0.7939 0.8412 0.8169 655 PER 0.6731 0.7848 0.7246 223 ORG 0.5263 0.4724 0.4979 127 micro avg 0.7360 0.7821 0.7583 1005 macro avg 0.6644 0.6995 0.6798 1005 weighted avg 0.7333 0.7821 0.7561 1005 2023-10-12 15:54:39,375 ----------------------------------------------------------------------------------------------------