2023-10-12 15:15:19,418 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:15:19,421 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 15:15:19,422 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:15:19,422 MultiCorpus: 5777 train + 722 dev + 723 test sentences - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl 2023-10-12 15:15:19,422 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:15:19,422 Train: 5777 sentences 2023-10-12 15:15:19,422 (train_with_dev=False, train_with_test=False) 2023-10-12 15:15:19,423 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:15:19,423 Training Params: 2023-10-12 15:15:19,423 - learning_rate: "0.00015" 2023-10-12 15:15:19,423 - mini_batch_size: "4" 2023-10-12 15:15:19,423 - max_epochs: "10" 2023-10-12 15:15:19,423 - shuffle: "True" 2023-10-12 15:15:19,423 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:15:19,423 Plugins: 2023-10-12 15:15:19,423 - TensorboardLogger 2023-10-12 15:15:19,423 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 15:15:19,424 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:15:19,424 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 15:15:19,424 - metric: "('micro avg', 'f1-score')" 2023-10-12 15:15:19,424 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:15:19,424 Computation: 2023-10-12 15:15:19,424 - compute on device: cuda:0 2023-10-12 15:15:19,424 - embedding storage: none 2023-10-12 15:15:19,424 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:15:19,424 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4" 2023-10-12 15:15:19,424 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:15:19,425 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:15:19,425 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 15:16:02,886 epoch 1 - iter 144/1445 - loss 2.56446737 - time (sec): 43.46 - samples/sec: 411.04 - lr: 0.000015 - momentum: 0.000000 2023-10-12 15:16:45,068 epoch 1 - iter 288/1445 - loss 2.39900332 - time (sec): 85.64 - samples/sec: 421.44 - lr: 0.000030 - momentum: 0.000000 2023-10-12 15:17:28,035 epoch 1 - iter 432/1445 - loss 2.15786011 - time (sec): 128.61 - samples/sec: 412.92 - lr: 0.000045 - momentum: 0.000000 2023-10-12 15:18:11,700 epoch 1 - iter 576/1445 - loss 1.87557502 - time (sec): 172.27 - samples/sec: 410.48 - lr: 0.000060 - momentum: 0.000000 2023-10-12 15:18:57,452 epoch 1 - iter 720/1445 - loss 1.59781747 - time (sec): 218.02 - samples/sec: 405.29 - lr: 0.000075 - momentum: 0.000000 2023-10-12 15:19:42,320 epoch 1 - iter 864/1445 - loss 1.38700860 - time (sec): 262.89 - samples/sec: 400.19 - lr: 0.000090 - momentum: 0.000000 2023-10-12 15:20:24,732 epoch 1 - iter 1008/1445 - loss 1.22534881 - time (sec): 305.30 - samples/sec: 400.81 - lr: 0.000105 - momentum: 0.000000 2023-10-12 15:21:08,949 epoch 1 - iter 1152/1445 - loss 1.09139103 - time (sec): 349.52 - samples/sec: 402.01 - lr: 0.000119 - momentum: 0.000000 2023-10-12 15:21:53,126 epoch 1 - iter 1296/1445 - loss 0.98625744 - time (sec): 393.70 - samples/sec: 403.55 - lr: 0.000134 - momentum: 0.000000 2023-10-12 15:22:34,918 epoch 1 - iter 1440/1445 - loss 0.90858934 - time (sec): 435.49 - samples/sec: 403.38 - lr: 0.000149 - momentum: 0.000000 2023-10-12 15:22:36,189 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:22:36,189 EPOCH 1 done: loss 0.9062 - lr: 0.000149 2023-10-12 15:22:56,319 DEV : loss 0.19574084877967834 - f1-score (micro avg) 0.2988 2023-10-12 15:22:56,356 saving best model 2023-10-12 15:22:57,292 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:23:39,944 epoch 2 - iter 144/1445 - loss 0.15105853 - time (sec): 42.65 - samples/sec: 415.87 - lr: 0.000148 - momentum: 0.000000 2023-10-12 15:24:21,844 epoch 2 - iter 288/1445 - loss 0.15207024 - time (sec): 84.55 - samples/sec: 412.28 - lr: 0.000147 - momentum: 0.000000 2023-10-12 15:25:04,075 epoch 2 - iter 432/1445 - loss 0.14842272 - time (sec): 126.78 - samples/sec: 411.77 - lr: 0.000145 - momentum: 0.000000 2023-10-12 15:25:45,753 epoch 2 - iter 576/1445 - loss 0.14024884 - time (sec): 168.46 - samples/sec: 412.50 - lr: 0.000143 - momentum: 0.000000 2023-10-12 15:26:27,565 epoch 2 - iter 720/1445 - loss 0.13333017 - time (sec): 210.27 - samples/sec: 408.23 - lr: 0.000142 - momentum: 0.000000 2023-10-12 15:27:10,323 epoch 2 - iter 864/1445 - loss 0.12971991 - time (sec): 253.03 - samples/sec: 409.42 - lr: 0.000140 - momentum: 0.000000 2023-10-12 15:27:53,609 epoch 2 - iter 1008/1445 - loss 0.12903687 - time (sec): 296.31 - samples/sec: 411.13 - lr: 0.000138 - momentum: 0.000000 2023-10-12 15:28:36,814 epoch 2 - iter 1152/1445 - loss 0.12595690 - time (sec): 339.52 - samples/sec: 412.11 - lr: 0.000137 - momentum: 0.000000 2023-10-12 15:29:19,645 epoch 2 - iter 1296/1445 - loss 0.12100448 - time (sec): 382.35 - samples/sec: 412.95 - lr: 0.000135 - momentum: 0.000000 2023-10-12 15:30:02,315 epoch 2 - iter 1440/1445 - loss 0.11964859 - time (sec): 425.02 - samples/sec: 412.87 - lr: 0.000133 - momentum: 0.000000 2023-10-12 15:30:03,861 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:30:03,862 EPOCH 2 done: loss 0.1194 - lr: 0.000133 2023-10-12 15:30:25,196 DEV : loss 0.1005769670009613 - f1-score (micro avg) 0.7922 2023-10-12 15:30:25,230 saving best model 2023-10-12 15:30:31,425 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:31:14,363 epoch 3 - iter 144/1445 - loss 0.08897709 - time (sec): 42.93 - samples/sec: 392.18 - lr: 0.000132 - momentum: 0.000000 2023-10-12 15:31:57,296 epoch 3 - iter 288/1445 - loss 0.08129413 - time (sec): 85.86 - samples/sec: 403.56 - lr: 0.000130 - momentum: 0.000000 2023-10-12 15:32:40,432 epoch 3 - iter 432/1445 - loss 0.08153104 - time (sec): 129.00 - samples/sec: 401.16 - lr: 0.000128 - momentum: 0.000000 2023-10-12 15:33:23,664 epoch 3 - iter 576/1445 - loss 0.07648521 - time (sec): 172.23 - samples/sec: 400.46 - lr: 0.000127 - momentum: 0.000000 2023-10-12 15:34:05,862 epoch 3 - iter 720/1445 - loss 0.07573910 - time (sec): 214.43 - samples/sec: 403.23 - lr: 0.000125 - momentum: 0.000000 2023-10-12 15:34:49,951 epoch 3 - iter 864/1445 - loss 0.07533949 - time (sec): 258.52 - samples/sec: 408.20 - lr: 0.000123 - momentum: 0.000000 2023-10-12 15:35:33,852 epoch 3 - iter 1008/1445 - loss 0.07273397 - time (sec): 302.42 - samples/sec: 408.15 - lr: 0.000122 - momentum: 0.000000 2023-10-12 15:36:17,762 epoch 3 - iter 1152/1445 - loss 0.07163768 - time (sec): 346.33 - samples/sec: 406.03 - lr: 0.000120 - momentum: 0.000000 2023-10-12 15:37:01,614 epoch 3 - iter 1296/1445 - loss 0.07004036 - time (sec): 390.18 - samples/sec: 404.50 - lr: 0.000118 - momentum: 0.000000 2023-10-12 15:37:44,837 epoch 3 - iter 1440/1445 - loss 0.06899109 - time (sec): 433.41 - samples/sec: 405.35 - lr: 0.000117 - momentum: 0.000000 2023-10-12 15:37:46,088 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:37:46,089 EPOCH 3 done: loss 0.0691 - lr: 0.000117 2023-10-12 15:38:08,289 DEV : loss 0.07717280089855194 - f1-score (micro avg) 0.8457 2023-10-12 15:38:08,321 saving best model 2023-10-12 15:38:13,197 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:38:56,183 epoch 4 - iter 144/1445 - loss 0.03499334 - time (sec): 42.98 - samples/sec: 439.77 - lr: 0.000115 - momentum: 0.000000 2023-10-12 15:39:39,297 epoch 4 - iter 288/1445 - loss 0.03816870 - time (sec): 86.09 - samples/sec: 406.89 - lr: 0.000113 - momentum: 0.000000 2023-10-12 15:40:24,715 epoch 4 - iter 432/1445 - loss 0.04038503 - time (sec): 131.51 - samples/sec: 392.97 - lr: 0.000112 - momentum: 0.000000 2023-10-12 15:41:11,609 epoch 4 - iter 576/1445 - loss 0.04477078 - time (sec): 178.41 - samples/sec: 386.49 - lr: 0.000110 - momentum: 0.000000 2023-10-12 15:41:54,797 epoch 4 - iter 720/1445 - loss 0.04478685 - time (sec): 221.59 - samples/sec: 387.98 - lr: 0.000108 - momentum: 0.000000 2023-10-12 15:42:37,893 epoch 4 - iter 864/1445 - loss 0.04738041 - time (sec): 264.69 - samples/sec: 393.16 - lr: 0.000107 - momentum: 0.000000 2023-10-12 15:43:23,732 epoch 4 - iter 1008/1445 - loss 0.04742958 - time (sec): 310.53 - samples/sec: 396.47 - lr: 0.000105 - momentum: 0.000000 2023-10-12 15:44:05,901 epoch 4 - iter 1152/1445 - loss 0.04752337 - time (sec): 352.70 - samples/sec: 397.46 - lr: 0.000103 - momentum: 0.000000 2023-10-12 15:44:50,488 epoch 4 - iter 1296/1445 - loss 0.04765215 - time (sec): 397.29 - samples/sec: 396.89 - lr: 0.000102 - momentum: 0.000000 2023-10-12 15:45:35,566 epoch 4 - iter 1440/1445 - loss 0.04648865 - time (sec): 442.36 - samples/sec: 397.29 - lr: 0.000100 - momentum: 0.000000 2023-10-12 15:45:36,797 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:45:36,797 EPOCH 4 done: loss 0.0466 - lr: 0.000100 2023-10-12 15:46:00,987 DEV : loss 0.0849224328994751 - f1-score (micro avg) 0.8446 2023-10-12 15:46:01,018 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:46:46,302 epoch 5 - iter 144/1445 - loss 0.02132865 - time (sec): 45.28 - samples/sec: 374.86 - lr: 0.000098 - momentum: 0.000000 2023-10-12 15:47:30,766 epoch 5 - iter 288/1445 - loss 0.03061339 - time (sec): 89.75 - samples/sec: 383.21 - lr: 0.000097 - momentum: 0.000000 2023-10-12 15:48:15,104 epoch 5 - iter 432/1445 - loss 0.03028162 - time (sec): 134.08 - samples/sec: 388.72 - lr: 0.000095 - momentum: 0.000000 2023-10-12 15:48:58,590 epoch 5 - iter 576/1445 - loss 0.03308569 - time (sec): 177.57 - samples/sec: 397.30 - lr: 0.000093 - momentum: 0.000000 2023-10-12 15:49:42,126 epoch 5 - iter 720/1445 - loss 0.03362962 - time (sec): 221.11 - samples/sec: 398.44 - lr: 0.000092 - momentum: 0.000000 2023-10-12 15:50:28,807 epoch 5 - iter 864/1445 - loss 0.03345350 - time (sec): 267.79 - samples/sec: 394.57 - lr: 0.000090 - momentum: 0.000000 2023-10-12 15:51:13,940 epoch 5 - iter 1008/1445 - loss 0.03227287 - time (sec): 312.92 - samples/sec: 390.31 - lr: 0.000088 - momentum: 0.000000 2023-10-12 15:51:59,616 epoch 5 - iter 1152/1445 - loss 0.03194495 - time (sec): 358.59 - samples/sec: 390.16 - lr: 0.000087 - momentum: 0.000000 2023-10-12 15:52:44,300 epoch 5 - iter 1296/1445 - loss 0.03222406 - time (sec): 403.28 - samples/sec: 390.65 - lr: 0.000085 - momentum: 0.000000 2023-10-12 15:53:29,788 epoch 5 - iter 1440/1445 - loss 0.03299335 - time (sec): 448.77 - samples/sec: 391.50 - lr: 0.000083 - momentum: 0.000000 2023-10-12 15:53:31,184 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:53:31,185 EPOCH 5 done: loss 0.0329 - lr: 0.000083 2023-10-12 15:53:55,028 DEV : loss 0.10914205759763718 - f1-score (micro avg) 0.8311 2023-10-12 15:53:55,065 ---------------------------------------------------------------------------------------------------- 2023-10-12 15:54:40,846 epoch 6 - iter 144/1445 - loss 0.03882506 - time (sec): 45.78 - samples/sec: 392.65 - lr: 0.000082 - momentum: 0.000000 2023-10-12 15:55:26,557 epoch 6 - iter 288/1445 - loss 0.03084229 - time (sec): 91.49 - samples/sec: 382.31 - lr: 0.000080 - momentum: 0.000000 2023-10-12 15:56:12,752 epoch 6 - iter 432/1445 - loss 0.03041994 - time (sec): 137.68 - samples/sec: 384.24 - lr: 0.000078 - momentum: 0.000000 2023-10-12 15:56:58,895 epoch 6 - iter 576/1445 - loss 0.02734547 - time (sec): 183.83 - samples/sec: 390.14 - lr: 0.000077 - momentum: 0.000000 2023-10-12 15:57:43,578 epoch 6 - iter 720/1445 - loss 0.02604150 - time (sec): 228.51 - samples/sec: 382.61 - lr: 0.000075 - momentum: 0.000000 2023-10-12 15:58:30,579 epoch 6 - iter 864/1445 - loss 0.02540022 - time (sec): 275.51 - samples/sec: 387.66 - lr: 0.000073 - momentum: 0.000000 2023-10-12 15:59:15,141 epoch 6 - iter 1008/1445 - loss 0.02451479 - time (sec): 320.07 - samples/sec: 386.20 - lr: 0.000072 - momentum: 0.000000 2023-10-12 16:00:01,259 epoch 6 - iter 1152/1445 - loss 0.02500212 - time (sec): 366.19 - samples/sec: 385.70 - lr: 0.000070 - momentum: 0.000000 2023-10-12 16:00:45,834 epoch 6 - iter 1296/1445 - loss 0.02489064 - time (sec): 410.77 - samples/sec: 386.49 - lr: 0.000068 - momentum: 0.000000 2023-10-12 16:01:29,226 epoch 6 - iter 1440/1445 - loss 0.02535863 - time (sec): 454.16 - samples/sec: 386.75 - lr: 0.000067 - momentum: 0.000000 2023-10-12 16:01:30,490 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:01:30,490 EPOCH 6 done: loss 0.0253 - lr: 0.000067 2023-10-12 16:01:51,988 DEV : loss 0.11875587701797485 - f1-score (micro avg) 0.8482 2023-10-12 16:01:52,036 saving best model 2023-10-12 16:01:54,852 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:02:39,542 epoch 7 - iter 144/1445 - loss 0.03038530 - time (sec): 44.68 - samples/sec: 374.03 - lr: 0.000065 - momentum: 0.000000 2023-10-12 16:03:25,324 epoch 7 - iter 288/1445 - loss 0.02530002 - time (sec): 90.46 - samples/sec: 395.77 - lr: 0.000063 - momentum: 0.000000 2023-10-12 16:04:11,294 epoch 7 - iter 432/1445 - loss 0.02518149 - time (sec): 136.43 - samples/sec: 392.86 - lr: 0.000062 - momentum: 0.000000 2023-10-12 16:04:58,370 epoch 7 - iter 576/1445 - loss 0.02168850 - time (sec): 183.51 - samples/sec: 387.18 - lr: 0.000060 - momentum: 0.000000 2023-10-12 16:05:44,422 epoch 7 - iter 720/1445 - loss 0.02150655 - time (sec): 229.56 - samples/sec: 386.56 - lr: 0.000058 - momentum: 0.000000 2023-10-12 16:06:29,472 epoch 7 - iter 864/1445 - loss 0.02045801 - time (sec): 274.61 - samples/sec: 389.41 - lr: 0.000057 - momentum: 0.000000 2023-10-12 16:07:15,503 epoch 7 - iter 1008/1445 - loss 0.02069689 - time (sec): 320.64 - samples/sec: 390.53 - lr: 0.000055 - momentum: 0.000000 2023-10-12 16:08:01,109 epoch 7 - iter 1152/1445 - loss 0.01975274 - time (sec): 366.25 - samples/sec: 389.30 - lr: 0.000053 - momentum: 0.000000 2023-10-12 16:08:46,100 epoch 7 - iter 1296/1445 - loss 0.01973590 - time (sec): 411.24 - samples/sec: 386.30 - lr: 0.000052 - momentum: 0.000000 2023-10-12 16:09:29,847 epoch 7 - iter 1440/1445 - loss 0.01942173 - time (sec): 454.99 - samples/sec: 386.35 - lr: 0.000050 - momentum: 0.000000 2023-10-12 16:09:31,022 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:09:31,023 EPOCH 7 done: loss 0.0194 - lr: 0.000050 2023-10-12 16:09:54,065 DEV : loss 0.13309906423091888 - f1-score (micro avg) 0.8351 2023-10-12 16:09:54,105 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:10:40,701 epoch 8 - iter 144/1445 - loss 0.00940699 - time (sec): 46.59 - samples/sec: 383.79 - lr: 0.000048 - momentum: 0.000000 2023-10-12 16:11:25,684 epoch 8 - iter 288/1445 - loss 0.01102595 - time (sec): 91.58 - samples/sec: 392.52 - lr: 0.000047 - momentum: 0.000000 2023-10-12 16:12:08,448 epoch 8 - iter 432/1445 - loss 0.01075043 - time (sec): 134.34 - samples/sec: 398.69 - lr: 0.000045 - momentum: 0.000000 2023-10-12 16:12:52,601 epoch 8 - iter 576/1445 - loss 0.01027819 - time (sec): 178.49 - samples/sec: 404.66 - lr: 0.000043 - momentum: 0.000000 2023-10-12 16:13:34,007 epoch 8 - iter 720/1445 - loss 0.01204139 - time (sec): 219.90 - samples/sec: 400.57 - lr: 0.000042 - momentum: 0.000000 2023-10-12 16:14:14,827 epoch 8 - iter 864/1445 - loss 0.01227007 - time (sec): 260.72 - samples/sec: 403.42 - lr: 0.000040 - momentum: 0.000000 2023-10-12 16:14:57,919 epoch 8 - iter 1008/1445 - loss 0.01336577 - time (sec): 303.81 - samples/sec: 403.21 - lr: 0.000038 - momentum: 0.000000 2023-10-12 16:15:41,052 epoch 8 - iter 1152/1445 - loss 0.01337781 - time (sec): 346.94 - samples/sec: 404.27 - lr: 0.000037 - momentum: 0.000000 2023-10-12 16:16:23,042 epoch 8 - iter 1296/1445 - loss 0.01298325 - time (sec): 388.94 - samples/sec: 405.93 - lr: 0.000035 - momentum: 0.000000 2023-10-12 16:17:05,402 epoch 8 - iter 1440/1445 - loss 0.01447451 - time (sec): 431.29 - samples/sec: 407.60 - lr: 0.000033 - momentum: 0.000000 2023-10-12 16:17:06,599 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:17:06,599 EPOCH 8 done: loss 0.0147 - lr: 0.000033 2023-10-12 16:17:28,670 DEV : loss 0.1283789426088333 - f1-score (micro avg) 0.8438 2023-10-12 16:17:28,706 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:18:11,562 epoch 9 - iter 144/1445 - loss 0.01151446 - time (sec): 42.85 - samples/sec: 428.92 - lr: 0.000032 - momentum: 0.000000 2023-10-12 16:18:53,072 epoch 9 - iter 288/1445 - loss 0.01030809 - time (sec): 84.36 - samples/sec: 417.05 - lr: 0.000030 - momentum: 0.000000 2023-10-12 16:19:32,814 epoch 9 - iter 432/1445 - loss 0.01163292 - time (sec): 124.11 - samples/sec: 420.79 - lr: 0.000028 - momentum: 0.000000 2023-10-12 16:20:13,365 epoch 9 - iter 576/1445 - loss 0.01139738 - time (sec): 164.66 - samples/sec: 422.89 - lr: 0.000027 - momentum: 0.000000 2023-10-12 16:20:54,872 epoch 9 - iter 720/1445 - loss 0.01028555 - time (sec): 206.16 - samples/sec: 426.66 - lr: 0.000025 - momentum: 0.000000 2023-10-12 16:21:36,008 epoch 9 - iter 864/1445 - loss 0.00974631 - time (sec): 247.30 - samples/sec: 426.81 - lr: 0.000023 - momentum: 0.000000 2023-10-12 16:22:18,902 epoch 9 - iter 1008/1445 - loss 0.00936650 - time (sec): 290.19 - samples/sec: 426.90 - lr: 0.000022 - momentum: 0.000000 2023-10-12 16:23:01,348 epoch 9 - iter 1152/1445 - loss 0.00977478 - time (sec): 332.64 - samples/sec: 424.00 - lr: 0.000020 - momentum: 0.000000 2023-10-12 16:23:43,120 epoch 9 - iter 1296/1445 - loss 0.01096799 - time (sec): 374.41 - samples/sec: 420.60 - lr: 0.000018 - momentum: 0.000000 2023-10-12 16:24:25,684 epoch 9 - iter 1440/1445 - loss 0.01070611 - time (sec): 416.98 - samples/sec: 419.98 - lr: 0.000017 - momentum: 0.000000 2023-10-12 16:24:27,598 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:24:27,598 EPOCH 9 done: loss 0.0116 - lr: 0.000017 2023-10-12 16:24:49,183 DEV : loss 0.14139343798160553 - f1-score (micro avg) 0.846 2023-10-12 16:24:49,214 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:25:31,908 epoch 10 - iter 144/1445 - loss 0.01156662 - time (sec): 42.69 - samples/sec: 441.37 - lr: 0.000015 - momentum: 0.000000 2023-10-12 16:26:13,828 epoch 10 - iter 288/1445 - loss 0.01021350 - time (sec): 84.61 - samples/sec: 430.66 - lr: 0.000013 - momentum: 0.000000 2023-10-12 16:26:54,402 epoch 10 - iter 432/1445 - loss 0.00845506 - time (sec): 125.19 - samples/sec: 427.89 - lr: 0.000012 - momentum: 0.000000 2023-10-12 16:27:35,989 epoch 10 - iter 576/1445 - loss 0.00796673 - time (sec): 166.77 - samples/sec: 428.08 - lr: 0.000010 - momentum: 0.000000 2023-10-12 16:28:18,022 epoch 10 - iter 720/1445 - loss 0.00817834 - time (sec): 208.81 - samples/sec: 427.42 - lr: 0.000008 - momentum: 0.000000 2023-10-12 16:29:01,491 epoch 10 - iter 864/1445 - loss 0.00788023 - time (sec): 252.28 - samples/sec: 427.77 - lr: 0.000007 - momentum: 0.000000 2023-10-12 16:29:42,731 epoch 10 - iter 1008/1445 - loss 0.00780104 - time (sec): 293.51 - samples/sec: 421.83 - lr: 0.000005 - momentum: 0.000000 2023-10-12 16:30:25,041 epoch 10 - iter 1152/1445 - loss 0.00747945 - time (sec): 335.83 - samples/sec: 423.67 - lr: 0.000003 - momentum: 0.000000 2023-10-12 16:31:05,760 epoch 10 - iter 1296/1445 - loss 0.00744856 - time (sec): 376.54 - samples/sec: 421.77 - lr: 0.000002 - momentum: 0.000000 2023-10-12 16:31:47,064 epoch 10 - iter 1440/1445 - loss 0.00752602 - time (sec): 417.85 - samples/sec: 420.71 - lr: 0.000000 - momentum: 0.000000 2023-10-12 16:31:48,246 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:31:48,246 EPOCH 10 done: loss 0.0075 - lr: 0.000000 2023-10-12 16:32:09,184 DEV : loss 0.15216395258903503 - f1-score (micro avg) 0.8448 2023-10-12 16:32:10,120 ---------------------------------------------------------------------------------------------------- 2023-10-12 16:32:10,122 Loading model from best epoch ... 2023-10-12 16:32:14,356 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 16:32:34,775 Results: - F-score (micro) 0.818 - F-score (macro) 0.7285 - Accuracy 0.7047 By class: precision recall f1-score support PER 0.8403 0.8299 0.8351 482 LOC 0.9093 0.8100 0.8568 458 ORG 0.4471 0.5507 0.4935 69 micro avg 0.8349 0.8018 0.8180 1009 macro avg 0.7322 0.7302 0.7285 1009 weighted avg 0.8448 0.8018 0.8216 1009 2023-10-12 16:32:34,776 ----------------------------------------------------------------------------------------------------