2023-10-12 10:39:56,361 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:39:56,363 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 10:39:56,363 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:39:56,363 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-12 10:39:56,363 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:39:56,363 Train: 20847 sentences 2023-10-12 10:39:56,364 (train_with_dev=False, train_with_test=False) 2023-10-12 10:39:56,364 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:39:56,364 Training Params: 2023-10-12 10:39:56,364 - learning_rate: "0.00015" 2023-10-12 10:39:56,364 - mini_batch_size: "8" 2023-10-12 10:39:56,364 - max_epochs: "10" 2023-10-12 10:39:56,364 - shuffle: "True" 2023-10-12 10:39:56,364 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:39:56,364 Plugins: 2023-10-12 10:39:56,364 - TensorboardLogger 2023-10-12 10:39:56,364 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 10:39:56,364 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:39:56,364 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 10:39:56,364 - metric: "('micro avg', 'f1-score')" 2023-10-12 10:39:56,364 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:39:56,365 Computation: 2023-10-12 10:39:56,365 - compute on device: cuda:0 2023-10-12 10:39:56,365 - embedding storage: none 2023-10-12 10:39:56,365 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:39:56,365 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-5" 2023-10-12 10:39:56,365 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:39:56,365 ---------------------------------------------------------------------------------------------------- 2023-10-12 10:39:56,365 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 10:42:18,682 epoch 1 - iter 260/2606 - loss 2.79098046 - time (sec): 142.31 - samples/sec: 285.86 - lr: 0.000015 - momentum: 0.000000 2023-10-12 10:44:40,875 epoch 1 - iter 520/2606 - loss 2.53978742 - time (sec): 284.51 - samples/sec: 278.51 - lr: 0.000030 - momentum: 0.000000 2023-10-12 10:47:00,532 epoch 1 - iter 780/2606 - loss 2.17133359 - time (sec): 424.16 - samples/sec: 271.17 - lr: 0.000045 - momentum: 0.000000 2023-10-12 10:49:20,483 epoch 1 - iter 1040/2606 - loss 1.79656729 - time (sec): 564.12 - samples/sec: 269.30 - lr: 0.000060 - momentum: 0.000000 2023-10-12 10:51:42,179 epoch 1 - iter 1300/2606 - loss 1.53837733 - time (sec): 705.81 - samples/sec: 267.96 - lr: 0.000075 - momentum: 0.000000 2023-10-12 10:54:02,563 epoch 1 - iter 1560/2606 - loss 1.35780764 - time (sec): 846.20 - samples/sec: 268.62 - lr: 0.000090 - momentum: 0.000000 2023-10-12 10:56:22,180 epoch 1 - iter 1820/2606 - loss 1.21922934 - time (sec): 985.81 - samples/sec: 266.92 - lr: 0.000105 - momentum: 0.000000 2023-10-12 10:58:41,031 epoch 1 - iter 2080/2606 - loss 1.10847958 - time (sec): 1124.66 - samples/sec: 266.18 - lr: 0.000120 - momentum: 0.000000 2023-10-12 11:00:56,908 epoch 1 - iter 2340/2606 - loss 1.02422584 - time (sec): 1260.54 - samples/sec: 264.41 - lr: 0.000135 - momentum: 0.000000 2023-10-12 11:03:12,052 epoch 1 - iter 2600/2606 - loss 0.95341170 - time (sec): 1395.68 - samples/sec: 262.62 - lr: 0.000150 - momentum: 0.000000 2023-10-12 11:03:15,142 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:03:15,143 EPOCH 1 done: loss 0.9515 - lr: 0.000150 2023-10-12 11:03:52,364 DEV : loss 0.12193801254034042 - f1-score (micro avg) 0.3298 2023-10-12 11:03:52,416 saving best model 2023-10-12 11:03:53,327 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:06:10,460 epoch 2 - iter 260/2606 - loss 0.22817029 - time (sec): 137.13 - samples/sec: 260.71 - lr: 0.000148 - momentum: 0.000000 2023-10-12 11:08:28,429 epoch 2 - iter 520/2606 - loss 0.19785334 - time (sec): 275.10 - samples/sec: 261.21 - lr: 0.000147 - momentum: 0.000000 2023-10-12 11:10:44,641 epoch 2 - iter 780/2606 - loss 0.18391103 - time (sec): 411.31 - samples/sec: 259.17 - lr: 0.000145 - momentum: 0.000000 2023-10-12 11:13:03,481 epoch 2 - iter 1040/2606 - loss 0.17657181 - time (sec): 550.15 - samples/sec: 260.46 - lr: 0.000143 - momentum: 0.000000 2023-10-12 11:15:22,641 epoch 2 - iter 1300/2606 - loss 0.17249729 - time (sec): 689.31 - samples/sec: 260.29 - lr: 0.000142 - momentum: 0.000000 2023-10-12 11:17:42,766 epoch 2 - iter 1560/2606 - loss 0.16463122 - time (sec): 829.44 - samples/sec: 262.66 - lr: 0.000140 - momentum: 0.000000 2023-10-12 11:20:00,409 epoch 2 - iter 1820/2606 - loss 0.16141368 - time (sec): 967.08 - samples/sec: 261.73 - lr: 0.000138 - momentum: 0.000000 2023-10-12 11:22:18,980 epoch 2 - iter 2080/2606 - loss 0.15898957 - time (sec): 1105.65 - samples/sec: 263.51 - lr: 0.000137 - momentum: 0.000000 2023-10-12 11:24:40,875 epoch 2 - iter 2340/2606 - loss 0.15595656 - time (sec): 1247.55 - samples/sec: 263.93 - lr: 0.000135 - momentum: 0.000000 2023-10-12 11:26:59,696 epoch 2 - iter 2600/2606 - loss 0.15306281 - time (sec): 1386.37 - samples/sec: 264.42 - lr: 0.000133 - momentum: 0.000000 2023-10-12 11:27:02,847 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:27:02,847 EPOCH 2 done: loss 0.1530 - lr: 0.000133 2023-10-12 11:27:44,631 DEV : loss 0.14174272119998932 - f1-score (micro avg) 0.3787 2023-10-12 11:27:44,688 saving best model 2023-10-12 11:27:47,305 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:30:06,778 epoch 3 - iter 260/2606 - loss 0.09674644 - time (sec): 139.47 - samples/sec: 254.60 - lr: 0.000132 - momentum: 0.000000 2023-10-12 11:32:22,055 epoch 3 - iter 520/2606 - loss 0.10014325 - time (sec): 274.75 - samples/sec: 250.67 - lr: 0.000130 - momentum: 0.000000 2023-10-12 11:34:42,409 epoch 3 - iter 780/2606 - loss 0.09513807 - time (sec): 415.10 - samples/sec: 256.98 - lr: 0.000128 - momentum: 0.000000 2023-10-12 11:37:01,896 epoch 3 - iter 1040/2606 - loss 0.09185215 - time (sec): 554.59 - samples/sec: 257.11 - lr: 0.000127 - momentum: 0.000000 2023-10-12 11:39:21,533 epoch 3 - iter 1300/2606 - loss 0.09008465 - time (sec): 694.22 - samples/sec: 258.02 - lr: 0.000125 - momentum: 0.000000 2023-10-12 11:41:41,861 epoch 3 - iter 1560/2606 - loss 0.09055602 - time (sec): 834.55 - samples/sec: 259.20 - lr: 0.000123 - momentum: 0.000000 2023-10-12 11:44:01,170 epoch 3 - iter 1820/2606 - loss 0.09064034 - time (sec): 973.86 - samples/sec: 258.25 - lr: 0.000122 - momentum: 0.000000 2023-10-12 11:46:21,153 epoch 3 - iter 2080/2606 - loss 0.09076692 - time (sec): 1113.84 - samples/sec: 259.66 - lr: 0.000120 - momentum: 0.000000 2023-10-12 11:48:43,071 epoch 3 - iter 2340/2606 - loss 0.08970600 - time (sec): 1255.76 - samples/sec: 262.05 - lr: 0.000118 - momentum: 0.000000 2023-10-12 11:51:01,574 epoch 3 - iter 2600/2606 - loss 0.09003826 - time (sec): 1394.26 - samples/sec: 262.94 - lr: 0.000117 - momentum: 0.000000 2023-10-12 11:51:04,717 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:51:04,718 EPOCH 3 done: loss 0.0910 - lr: 0.000117 2023-10-12 11:51:46,787 DEV : loss 0.1857159584760666 - f1-score (micro avg) 0.3808 2023-10-12 11:51:46,841 saving best model 2023-10-12 11:51:49,486 ---------------------------------------------------------------------------------------------------- 2023-10-12 11:54:08,085 epoch 4 - iter 260/2606 - loss 0.06864050 - time (sec): 138.59 - samples/sec: 257.51 - lr: 0.000115 - momentum: 0.000000 2023-10-12 11:56:28,898 epoch 4 - iter 520/2606 - loss 0.06148994 - time (sec): 279.41 - samples/sec: 261.33 - lr: 0.000113 - momentum: 0.000000 2023-10-12 11:58:48,559 epoch 4 - iter 780/2606 - loss 0.06145061 - time (sec): 419.07 - samples/sec: 260.81 - lr: 0.000112 - momentum: 0.000000 2023-10-12 12:01:11,693 epoch 4 - iter 1040/2606 - loss 0.06341903 - time (sec): 562.20 - samples/sec: 256.24 - lr: 0.000110 - momentum: 0.000000 2023-10-12 12:03:37,108 epoch 4 - iter 1300/2606 - loss 0.06522316 - time (sec): 707.62 - samples/sec: 257.77 - lr: 0.000108 - momentum: 0.000000 2023-10-12 12:06:02,487 epoch 4 - iter 1560/2606 - loss 0.06572690 - time (sec): 853.00 - samples/sec: 259.00 - lr: 0.000107 - momentum: 0.000000 2023-10-12 12:08:21,818 epoch 4 - iter 1820/2606 - loss 0.06365649 - time (sec): 992.33 - samples/sec: 259.68 - lr: 0.000105 - momentum: 0.000000 2023-10-12 12:10:38,937 epoch 4 - iter 2080/2606 - loss 0.06307986 - time (sec): 1129.45 - samples/sec: 260.42 - lr: 0.000103 - momentum: 0.000000 2023-10-12 12:12:58,069 epoch 4 - iter 2340/2606 - loss 0.06305110 - time (sec): 1268.58 - samples/sec: 260.22 - lr: 0.000102 - momentum: 0.000000 2023-10-12 12:15:16,561 epoch 4 - iter 2600/2606 - loss 0.06362841 - time (sec): 1407.07 - samples/sec: 260.73 - lr: 0.000100 - momentum: 0.000000 2023-10-12 12:15:19,400 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:15:19,401 EPOCH 4 done: loss 0.0637 - lr: 0.000100 2023-10-12 12:16:00,766 DEV : loss 0.22897003591060638 - f1-score (micro avg) 0.3706 2023-10-12 12:16:00,819 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:18:16,863 epoch 5 - iter 260/2606 - loss 0.04354472 - time (sec): 136.04 - samples/sec: 264.82 - lr: 0.000098 - momentum: 0.000000 2023-10-12 12:20:33,081 epoch 5 - iter 520/2606 - loss 0.04345957 - time (sec): 272.26 - samples/sec: 257.57 - lr: 0.000097 - momentum: 0.000000 2023-10-12 12:22:52,495 epoch 5 - iter 780/2606 - loss 0.04448711 - time (sec): 411.67 - samples/sec: 261.23 - lr: 0.000095 - momentum: 0.000000 2023-10-12 12:25:11,546 epoch 5 - iter 1040/2606 - loss 0.04417456 - time (sec): 550.72 - samples/sec: 260.87 - lr: 0.000093 - momentum: 0.000000 2023-10-12 12:27:30,883 epoch 5 - iter 1300/2606 - loss 0.04446586 - time (sec): 690.06 - samples/sec: 262.18 - lr: 0.000092 - momentum: 0.000000 2023-10-12 12:29:53,071 epoch 5 - iter 1560/2606 - loss 0.04447562 - time (sec): 832.25 - samples/sec: 265.01 - lr: 0.000090 - momentum: 0.000000 2023-10-12 12:32:13,835 epoch 5 - iter 1820/2606 - loss 0.04488614 - time (sec): 973.01 - samples/sec: 266.42 - lr: 0.000088 - momentum: 0.000000 2023-10-12 12:34:27,520 epoch 5 - iter 2080/2606 - loss 0.04528989 - time (sec): 1106.70 - samples/sec: 264.22 - lr: 0.000087 - momentum: 0.000000 2023-10-12 12:36:51,484 epoch 5 - iter 2340/2606 - loss 0.04482478 - time (sec): 1250.66 - samples/sec: 263.88 - lr: 0.000085 - momentum: 0.000000 2023-10-12 12:39:13,945 epoch 5 - iter 2600/2606 - loss 0.04472883 - time (sec): 1393.12 - samples/sec: 263.19 - lr: 0.000083 - momentum: 0.000000 2023-10-12 12:39:17,034 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:39:17,034 EPOCH 5 done: loss 0.0447 - lr: 0.000083 2023-10-12 12:39:59,133 DEV : loss 0.2849178612232208 - f1-score (micro avg) 0.3723 2023-10-12 12:39:59,190 ---------------------------------------------------------------------------------------------------- 2023-10-12 12:42:16,257 epoch 6 - iter 260/2606 - loss 0.03205628 - time (sec): 137.07 - samples/sec: 257.42 - lr: 0.000082 - momentum: 0.000000 2023-10-12 12:44:40,129 epoch 6 - iter 520/2606 - loss 0.03060171 - time (sec): 280.94 - samples/sec: 264.85 - lr: 0.000080 - momentum: 0.000000 2023-10-12 12:46:59,629 epoch 6 - iter 780/2606 - loss 0.02986452 - time (sec): 420.44 - samples/sec: 265.75 - lr: 0.000078 - momentum: 0.000000 2023-10-12 12:49:19,956 epoch 6 - iter 1040/2606 - loss 0.03282256 - time (sec): 560.76 - samples/sec: 262.17 - lr: 0.000077 - momentum: 0.000000 2023-10-12 12:51:37,247 epoch 6 - iter 1300/2606 - loss 0.03337104 - time (sec): 698.06 - samples/sec: 260.05 - lr: 0.000075 - momentum: 0.000000 2023-10-12 12:53:58,164 epoch 6 - iter 1560/2606 - loss 0.03383846 - time (sec): 838.97 - samples/sec: 262.62 - lr: 0.000073 - momentum: 0.000000 2023-10-12 12:56:20,297 epoch 6 - iter 1820/2606 - loss 0.03513634 - time (sec): 981.11 - samples/sec: 263.52 - lr: 0.000072 - momentum: 0.000000 2023-10-12 12:58:37,971 epoch 6 - iter 2080/2606 - loss 0.03457993 - time (sec): 1118.78 - samples/sec: 261.28 - lr: 0.000070 - momentum: 0.000000 2023-10-12 13:00:55,395 epoch 6 - iter 2340/2606 - loss 0.03450918 - time (sec): 1256.20 - samples/sec: 261.02 - lr: 0.000068 - momentum: 0.000000 2023-10-12 13:03:15,842 epoch 6 - iter 2600/2606 - loss 0.03375577 - time (sec): 1396.65 - samples/sec: 262.22 - lr: 0.000067 - momentum: 0.000000 2023-10-12 13:03:19,438 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:03:19,439 EPOCH 6 done: loss 0.0337 - lr: 0.000067 2023-10-12 13:04:01,257 DEV : loss 0.35980162024497986 - f1-score (micro avg) 0.3871 2023-10-12 13:04:01,331 saving best model 2023-10-12 13:04:04,002 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:06:24,564 epoch 7 - iter 260/2606 - loss 0.02056707 - time (sec): 140.56 - samples/sec: 262.29 - lr: 0.000065 - momentum: 0.000000 2023-10-12 13:08:44,030 epoch 7 - iter 520/2606 - loss 0.02211430 - time (sec): 280.02 - samples/sec: 259.85 - lr: 0.000063 - momentum: 0.000000 2023-10-12 13:11:03,498 epoch 7 - iter 780/2606 - loss 0.02238197 - time (sec): 419.49 - samples/sec: 261.02 - lr: 0.000062 - momentum: 0.000000 2023-10-12 13:13:22,390 epoch 7 - iter 1040/2606 - loss 0.02341070 - time (sec): 558.38 - samples/sec: 262.32 - lr: 0.000060 - momentum: 0.000000 2023-10-12 13:15:40,113 epoch 7 - iter 1300/2606 - loss 0.02460152 - time (sec): 696.11 - samples/sec: 264.02 - lr: 0.000058 - momentum: 0.000000 2023-10-12 13:18:05,020 epoch 7 - iter 1560/2606 - loss 0.02429302 - time (sec): 841.01 - samples/sec: 268.16 - lr: 0.000057 - momentum: 0.000000 2023-10-12 13:20:23,524 epoch 7 - iter 1820/2606 - loss 0.02376757 - time (sec): 979.52 - samples/sec: 266.99 - lr: 0.000055 - momentum: 0.000000 2023-10-12 13:22:39,436 epoch 7 - iter 2080/2606 - loss 0.02507358 - time (sec): 1115.43 - samples/sec: 264.44 - lr: 0.000053 - momentum: 0.000000 2023-10-12 13:24:57,530 epoch 7 - iter 2340/2606 - loss 0.02604305 - time (sec): 1253.52 - samples/sec: 263.11 - lr: 0.000052 - momentum: 0.000000 2023-10-12 13:27:18,710 epoch 7 - iter 2600/2606 - loss 0.02570231 - time (sec): 1394.70 - samples/sec: 262.85 - lr: 0.000050 - momentum: 0.000000 2023-10-12 13:27:21,846 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:27:21,846 EPOCH 7 done: loss 0.0257 - lr: 0.000050 2023-10-12 13:28:03,580 DEV : loss 0.40007448196411133 - f1-score (micro avg) 0.3978 2023-10-12 13:28:03,637 saving best model 2023-10-12 13:28:06,266 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:30:27,485 epoch 8 - iter 260/2606 - loss 0.01482737 - time (sec): 141.21 - samples/sec: 266.04 - lr: 0.000048 - momentum: 0.000000 2023-10-12 13:32:45,532 epoch 8 - iter 520/2606 - loss 0.01696598 - time (sec): 279.26 - samples/sec: 261.91 - lr: 0.000047 - momentum: 0.000000 2023-10-12 13:35:06,637 epoch 8 - iter 780/2606 - loss 0.01817455 - time (sec): 420.37 - samples/sec: 262.67 - lr: 0.000045 - momentum: 0.000000 2023-10-12 13:37:25,483 epoch 8 - iter 1040/2606 - loss 0.01837600 - time (sec): 559.21 - samples/sec: 264.78 - lr: 0.000043 - momentum: 0.000000 2023-10-12 13:39:41,894 epoch 8 - iter 1300/2606 - loss 0.01883484 - time (sec): 695.62 - samples/sec: 261.77 - lr: 0.000042 - momentum: 0.000000 2023-10-12 13:41:59,734 epoch 8 - iter 1560/2606 - loss 0.01783081 - time (sec): 833.46 - samples/sec: 262.78 - lr: 0.000040 - momentum: 0.000000 2023-10-12 13:44:16,165 epoch 8 - iter 1820/2606 - loss 0.01833982 - time (sec): 969.89 - samples/sec: 261.10 - lr: 0.000038 - momentum: 0.000000 2023-10-12 13:46:35,668 epoch 8 - iter 2080/2606 - loss 0.01813588 - time (sec): 1109.40 - samples/sec: 260.69 - lr: 0.000037 - momentum: 0.000000 2023-10-12 13:48:57,758 epoch 8 - iter 2340/2606 - loss 0.01823581 - time (sec): 1251.49 - samples/sec: 263.09 - lr: 0.000035 - momentum: 0.000000 2023-10-12 13:51:20,031 epoch 8 - iter 2600/2606 - loss 0.01778836 - time (sec): 1393.76 - samples/sec: 262.81 - lr: 0.000033 - momentum: 0.000000 2023-10-12 13:51:23,576 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:51:23,576 EPOCH 8 done: loss 0.0177 - lr: 0.000033 2023-10-12 13:52:07,288 DEV : loss 0.4319295883178711 - f1-score (micro avg) 0.4038 2023-10-12 13:52:07,356 saving best model 2023-10-12 13:52:09,992 ---------------------------------------------------------------------------------------------------- 2023-10-12 13:54:30,936 epoch 9 - iter 260/2606 - loss 0.01164345 - time (sec): 140.94 - samples/sec: 265.00 - lr: 0.000032 - momentum: 0.000000 2023-10-12 13:56:44,434 epoch 9 - iter 520/2606 - loss 0.01383209 - time (sec): 274.44 - samples/sec: 255.72 - lr: 0.000030 - momentum: 0.000000 2023-10-12 13:59:02,722 epoch 9 - iter 780/2606 - loss 0.01292411 - time (sec): 412.73 - samples/sec: 260.02 - lr: 0.000028 - momentum: 0.000000 2023-10-12 14:01:23,730 epoch 9 - iter 1040/2606 - loss 0.01215414 - time (sec): 553.73 - samples/sec: 260.86 - lr: 0.000027 - momentum: 0.000000 2023-10-12 14:03:45,315 epoch 9 - iter 1300/2606 - loss 0.01270955 - time (sec): 695.32 - samples/sec: 263.37 - lr: 0.000025 - momentum: 0.000000 2023-10-12 14:06:04,587 epoch 9 - iter 1560/2606 - loss 0.01279780 - time (sec): 834.59 - samples/sec: 262.06 - lr: 0.000023 - momentum: 0.000000 2023-10-12 14:08:30,951 epoch 9 - iter 1820/2606 - loss 0.01225063 - time (sec): 980.95 - samples/sec: 261.88 - lr: 0.000022 - momentum: 0.000000 2023-10-12 14:10:55,018 epoch 9 - iter 2080/2606 - loss 0.01260601 - time (sec): 1125.02 - samples/sec: 261.01 - lr: 0.000020 - momentum: 0.000000 2023-10-12 14:13:17,567 epoch 9 - iter 2340/2606 - loss 0.01255088 - time (sec): 1267.57 - samples/sec: 260.39 - lr: 0.000018 - momentum: 0.000000 2023-10-12 14:15:34,884 epoch 9 - iter 2600/2606 - loss 0.01226504 - time (sec): 1404.89 - samples/sec: 261.09 - lr: 0.000017 - momentum: 0.000000 2023-10-12 14:15:37,671 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:15:37,671 EPOCH 9 done: loss 0.0122 - lr: 0.000017 2023-10-12 14:16:18,359 DEV : loss 0.43767231702804565 - f1-score (micro avg) 0.41 2023-10-12 14:16:18,411 saving best model 2023-10-12 14:16:20,975 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:18:40,818 epoch 10 - iter 260/2606 - loss 0.00960636 - time (sec): 139.84 - samples/sec: 261.49 - lr: 0.000015 - momentum: 0.000000 2023-10-12 14:20:58,673 epoch 10 - iter 520/2606 - loss 0.01062704 - time (sec): 277.69 - samples/sec: 259.88 - lr: 0.000013 - momentum: 0.000000 2023-10-12 14:23:18,738 epoch 10 - iter 780/2606 - loss 0.01059562 - time (sec): 417.76 - samples/sec: 262.80 - lr: 0.000012 - momentum: 0.000000 2023-10-12 14:25:36,355 epoch 10 - iter 1040/2606 - loss 0.00990986 - time (sec): 555.38 - samples/sec: 260.97 - lr: 0.000010 - momentum: 0.000000 2023-10-12 14:27:56,432 epoch 10 - iter 1300/2606 - loss 0.00958578 - time (sec): 695.45 - samples/sec: 261.48 - lr: 0.000008 - momentum: 0.000000 2023-10-12 14:30:16,906 epoch 10 - iter 1560/2606 - loss 0.00964887 - time (sec): 835.93 - samples/sec: 261.59 - lr: 0.000007 - momentum: 0.000000 2023-10-12 14:32:34,264 epoch 10 - iter 1820/2606 - loss 0.00928930 - time (sec): 973.28 - samples/sec: 260.18 - lr: 0.000005 - momentum: 0.000000 2023-10-12 14:34:54,973 epoch 10 - iter 2080/2606 - loss 0.00963159 - time (sec): 1113.99 - samples/sec: 261.10 - lr: 0.000003 - momentum: 0.000000 2023-10-12 14:37:15,806 epoch 10 - iter 2340/2606 - loss 0.00957205 - time (sec): 1254.83 - samples/sec: 261.18 - lr: 0.000002 - momentum: 0.000000 2023-10-12 14:39:35,671 epoch 10 - iter 2600/2606 - loss 0.00934680 - time (sec): 1394.69 - samples/sec: 262.81 - lr: 0.000000 - momentum: 0.000000 2023-10-12 14:39:38,739 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:39:38,739 EPOCH 10 done: loss 0.0093 - lr: 0.000000 2023-10-12 14:40:22,122 DEV : loss 0.45118266344070435 - f1-score (micro avg) 0.4146 2023-10-12 14:40:22,181 saving best model 2023-10-12 14:40:24,269 ---------------------------------------------------------------------------------------------------- 2023-10-12 14:40:24,271 Loading model from best epoch ... 2023-10-12 14:40:28,579 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-12 14:42:13,016 Results: - F-score (micro) 0.4344 - F-score (macro) 0.3036 - Accuracy 0.2816 By class: precision recall f1-score support LOC 0.4256 0.4992 0.4594 1214 PER 0.4038 0.5272 0.4573 808 ORG 0.2895 0.3059 0.2975 353 HumanProd 0.0000 0.0000 0.0000 15 micro avg 0.3987 0.4770 0.4344 2390 macro avg 0.2797 0.3331 0.3036 2390 weighted avg 0.3954 0.4770 0.4319 2390 2023-10-12 14:42:13,016 ----------------------------------------------------------------------------------------------------