2023-10-13 02:45:50,977 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:45:50,979 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 02:45:50,979 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:45:50,979 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr 2023-10-13 02:45:50,979 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:45:50,979 Train: 7936 sentences 2023-10-13 02:45:50,979 (train_with_dev=False, train_with_test=False) 2023-10-13 02:45:50,979 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:45:50,980 Training Params: 2023-10-13 02:45:50,980 - learning_rate: "0.00015" 2023-10-13 02:45:50,980 - mini_batch_size: "8" 2023-10-13 02:45:50,980 - max_epochs: "10" 2023-10-13 02:45:50,980 - shuffle: "True" 2023-10-13 02:45:50,980 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:45:50,980 Plugins: 2023-10-13 02:45:50,980 - TensorboardLogger 2023-10-13 02:45:50,980 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 02:45:50,980 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:45:50,980 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 02:45:50,980 - metric: "('micro avg', 'f1-score')" 2023-10-13 02:45:50,980 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:45:50,980 Computation: 2023-10-13 02:45:50,980 - compute on device: cuda:0 2023-10-13 02:45:50,981 - embedding storage: none 2023-10-13 02:45:50,981 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:45:50,981 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4" 2023-10-13 02:45:50,981 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:45:50,981 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:45:50,981 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-13 02:46:39,523 epoch 1 - iter 99/992 - loss 2.56961245 - time (sec): 48.54 - samples/sec: 336.05 - lr: 0.000015 - momentum: 0.000000 2023-10-13 02:47:30,976 epoch 1 - iter 198/992 - loss 2.47403734 - time (sec): 99.99 - samples/sec: 333.43 - lr: 0.000030 - momentum: 0.000000 2023-10-13 02:48:21,685 epoch 1 - iter 297/992 - loss 2.26079180 - time (sec): 150.70 - samples/sec: 332.90 - lr: 0.000045 - momentum: 0.000000 2023-10-13 02:49:10,498 epoch 1 - iter 396/992 - loss 2.02581573 - time (sec): 199.52 - samples/sec: 331.80 - lr: 0.000060 - momentum: 0.000000 2023-10-13 02:50:00,059 epoch 1 - iter 495/992 - loss 1.78525056 - time (sec): 249.08 - samples/sec: 331.63 - lr: 0.000075 - momentum: 0.000000 2023-10-13 02:50:47,547 epoch 1 - iter 594/992 - loss 1.57150807 - time (sec): 296.56 - samples/sec: 333.31 - lr: 0.000090 - momentum: 0.000000 2023-10-13 02:51:36,101 epoch 1 - iter 693/992 - loss 1.39756059 - time (sec): 345.12 - samples/sec: 332.46 - lr: 0.000105 - momentum: 0.000000 2023-10-13 02:52:25,502 epoch 1 - iter 792/992 - loss 1.25753654 - time (sec): 394.52 - samples/sec: 332.87 - lr: 0.000120 - momentum: 0.000000 2023-10-13 02:53:14,808 epoch 1 - iter 891/992 - loss 1.14076192 - time (sec): 443.83 - samples/sec: 333.00 - lr: 0.000135 - momentum: 0.000000 2023-10-13 02:54:03,885 epoch 1 - iter 990/992 - loss 1.05285946 - time (sec): 492.90 - samples/sec: 331.78 - lr: 0.000150 - momentum: 0.000000 2023-10-13 02:54:04,971 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:54:04,971 EPOCH 1 done: loss 1.0512 - lr: 0.000150 2023-10-13 02:54:29,856 DEV : loss 0.189751997590065 - f1-score (micro avg) 0.2835 2023-10-13 02:54:29,901 saving best model 2023-10-13 02:54:30,774 ---------------------------------------------------------------------------------------------------- 2023-10-13 02:55:19,892 epoch 2 - iter 99/992 - loss 0.19743591 - time (sec): 49.12 - samples/sec: 336.43 - lr: 0.000148 - momentum: 0.000000 2023-10-13 02:56:07,700 epoch 2 - iter 198/992 - loss 0.19064895 - time (sec): 96.92 - samples/sec: 338.90 - lr: 0.000147 - momentum: 0.000000 2023-10-13 02:56:55,366 epoch 2 - iter 297/992 - loss 0.18203876 - time (sec): 144.59 - samples/sec: 339.41 - lr: 0.000145 - momentum: 0.000000 2023-10-13 02:57:41,992 epoch 2 - iter 396/992 - loss 0.16968369 - time (sec): 191.22 - samples/sec: 336.78 - lr: 0.000143 - momentum: 0.000000 2023-10-13 02:58:30,387 epoch 2 - iter 495/992 - loss 0.16294565 - time (sec): 239.61 - samples/sec: 340.83 - lr: 0.000142 - momentum: 0.000000 2023-10-13 02:59:18,718 epoch 2 - iter 594/992 - loss 0.16098764 - time (sec): 287.94 - samples/sec: 340.16 - lr: 0.000140 - momentum: 0.000000 2023-10-13 03:00:07,562 epoch 2 - iter 693/992 - loss 0.15638470 - time (sec): 336.79 - samples/sec: 340.16 - lr: 0.000138 - momentum: 0.000000 2023-10-13 03:00:56,552 epoch 2 - iter 792/992 - loss 0.15321316 - time (sec): 385.78 - samples/sec: 339.09 - lr: 0.000137 - momentum: 0.000000 2023-10-13 03:01:45,805 epoch 2 - iter 891/992 - loss 0.14849866 - time (sec): 435.03 - samples/sec: 338.63 - lr: 0.000135 - momentum: 0.000000 2023-10-13 03:02:35,005 epoch 2 - iter 990/992 - loss 0.14530402 - time (sec): 484.23 - samples/sec: 338.11 - lr: 0.000133 - momentum: 0.000000 2023-10-13 03:02:35,969 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:02:35,969 EPOCH 2 done: loss 0.1452 - lr: 0.000133 2023-10-13 03:03:03,068 DEV : loss 0.0899583250284195 - f1-score (micro avg) 0.703 2023-10-13 03:03:03,110 saving best model 2023-10-13 03:03:05,696 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:03:54,978 epoch 3 - iter 99/992 - loss 0.08222575 - time (sec): 49.28 - samples/sec: 347.99 - lr: 0.000132 - momentum: 0.000000 2023-10-13 03:04:44,316 epoch 3 - iter 198/992 - loss 0.08512774 - time (sec): 98.62 - samples/sec: 336.66 - lr: 0.000130 - momentum: 0.000000 2023-10-13 03:05:34,818 epoch 3 - iter 297/992 - loss 0.08873913 - time (sec): 149.12 - samples/sec: 331.60 - lr: 0.000128 - momentum: 0.000000 2023-10-13 03:06:24,586 epoch 3 - iter 396/992 - loss 0.08353607 - time (sec): 198.89 - samples/sec: 331.63 - lr: 0.000127 - momentum: 0.000000 2023-10-13 03:07:14,249 epoch 3 - iter 495/992 - loss 0.08572589 - time (sec): 248.55 - samples/sec: 327.65 - lr: 0.000125 - momentum: 0.000000 2023-10-13 03:08:03,123 epoch 3 - iter 594/992 - loss 0.08458075 - time (sec): 297.42 - samples/sec: 327.92 - lr: 0.000123 - momentum: 0.000000 2023-10-13 03:08:52,086 epoch 3 - iter 693/992 - loss 0.08163811 - time (sec): 346.39 - samples/sec: 328.61 - lr: 0.000122 - momentum: 0.000000 2023-10-13 03:09:41,562 epoch 3 - iter 792/992 - loss 0.08057501 - time (sec): 395.86 - samples/sec: 329.26 - lr: 0.000120 - momentum: 0.000000 2023-10-13 03:10:30,177 epoch 3 - iter 891/992 - loss 0.08056998 - time (sec): 444.48 - samples/sec: 332.01 - lr: 0.000118 - momentum: 0.000000 2023-10-13 03:11:18,397 epoch 3 - iter 990/992 - loss 0.08028046 - time (sec): 492.70 - samples/sec: 331.96 - lr: 0.000117 - momentum: 0.000000 2023-10-13 03:11:19,410 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:11:19,410 EPOCH 3 done: loss 0.0801 - lr: 0.000117 2023-10-13 03:11:44,118 DEV : loss 0.0877280905842781 - f1-score (micro avg) 0.7395 2023-10-13 03:11:44,162 saving best model 2023-10-13 03:11:46,764 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:12:34,766 epoch 4 - iter 99/992 - loss 0.06780364 - time (sec): 48.00 - samples/sec: 347.33 - lr: 0.000115 - momentum: 0.000000 2023-10-13 03:13:22,482 epoch 4 - iter 198/992 - loss 0.05882071 - time (sec): 95.71 - samples/sec: 340.16 - lr: 0.000113 - momentum: 0.000000 2023-10-13 03:14:12,450 epoch 4 - iter 297/992 - loss 0.05949767 - time (sec): 145.68 - samples/sec: 346.78 - lr: 0.000112 - momentum: 0.000000 2023-10-13 03:15:00,665 epoch 4 - iter 396/992 - loss 0.05855466 - time (sec): 193.90 - samples/sec: 343.81 - lr: 0.000110 - momentum: 0.000000 2023-10-13 03:15:49,149 epoch 4 - iter 495/992 - loss 0.05887271 - time (sec): 242.38 - samples/sec: 343.32 - lr: 0.000108 - momentum: 0.000000 2023-10-13 03:16:38,172 epoch 4 - iter 594/992 - loss 0.05722243 - time (sec): 291.40 - samples/sec: 341.28 - lr: 0.000107 - momentum: 0.000000 2023-10-13 03:17:27,473 epoch 4 - iter 693/992 - loss 0.05557787 - time (sec): 340.70 - samples/sec: 339.82 - lr: 0.000105 - momentum: 0.000000 2023-10-13 03:18:15,423 epoch 4 - iter 792/992 - loss 0.05593158 - time (sec): 388.65 - samples/sec: 337.89 - lr: 0.000103 - momentum: 0.000000 2023-10-13 03:19:04,752 epoch 4 - iter 891/992 - loss 0.05487909 - time (sec): 437.98 - samples/sec: 338.10 - lr: 0.000102 - momentum: 0.000000 2023-10-13 03:19:53,220 epoch 4 - iter 990/992 - loss 0.05483706 - time (sec): 486.45 - samples/sec: 336.65 - lr: 0.000100 - momentum: 0.000000 2023-10-13 03:19:54,152 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:19:54,152 EPOCH 4 done: loss 0.0550 - lr: 0.000100 2023-10-13 03:20:19,612 DEV : loss 0.1051759347319603 - f1-score (micro avg) 0.7595 2023-10-13 03:20:19,651 saving best model 2023-10-13 03:20:22,244 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:21:09,770 epoch 5 - iter 99/992 - loss 0.03948375 - time (sec): 47.52 - samples/sec: 341.53 - lr: 0.000098 - momentum: 0.000000 2023-10-13 03:21:56,783 epoch 5 - iter 198/992 - loss 0.03766651 - time (sec): 94.53 - samples/sec: 333.00 - lr: 0.000097 - momentum: 0.000000 2023-10-13 03:22:44,329 epoch 5 - iter 297/992 - loss 0.04259334 - time (sec): 142.08 - samples/sec: 339.58 - lr: 0.000095 - momentum: 0.000000 2023-10-13 03:23:32,128 epoch 5 - iter 396/992 - loss 0.03986359 - time (sec): 189.88 - samples/sec: 339.80 - lr: 0.000093 - momentum: 0.000000 2023-10-13 03:24:22,045 epoch 5 - iter 495/992 - loss 0.03872665 - time (sec): 239.80 - samples/sec: 344.75 - lr: 0.000092 - momentum: 0.000000 2023-10-13 03:25:10,868 epoch 5 - iter 594/992 - loss 0.03987916 - time (sec): 288.62 - samples/sec: 340.91 - lr: 0.000090 - momentum: 0.000000 2023-10-13 03:25:59,871 epoch 5 - iter 693/992 - loss 0.04095702 - time (sec): 337.62 - samples/sec: 338.46 - lr: 0.000088 - momentum: 0.000000 2023-10-13 03:26:48,443 epoch 5 - iter 792/992 - loss 0.04161114 - time (sec): 386.19 - samples/sec: 335.81 - lr: 0.000087 - momentum: 0.000000 2023-10-13 03:27:36,684 epoch 5 - iter 891/992 - loss 0.04037073 - time (sec): 434.44 - samples/sec: 336.23 - lr: 0.000085 - momentum: 0.000000 2023-10-13 03:28:26,132 epoch 5 - iter 990/992 - loss 0.04078977 - time (sec): 483.88 - samples/sec: 338.17 - lr: 0.000083 - momentum: 0.000000 2023-10-13 03:28:27,099 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:28:27,099 EPOCH 5 done: loss 0.0408 - lr: 0.000083 2023-10-13 03:28:51,984 DEV : loss 0.1180945560336113 - f1-score (micro avg) 0.7646 2023-10-13 03:28:52,025 saving best model 2023-10-13 03:28:54,670 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:29:42,915 epoch 6 - iter 99/992 - loss 0.02582456 - time (sec): 48.24 - samples/sec: 319.59 - lr: 0.000082 - momentum: 0.000000 2023-10-13 03:30:31,344 epoch 6 - iter 198/992 - loss 0.03059858 - time (sec): 96.67 - samples/sec: 326.02 - lr: 0.000080 - momentum: 0.000000 2023-10-13 03:31:20,262 epoch 6 - iter 297/992 - loss 0.03357393 - time (sec): 145.59 - samples/sec: 327.73 - lr: 0.000078 - momentum: 0.000000 2023-10-13 03:32:10,654 epoch 6 - iter 396/992 - loss 0.03228723 - time (sec): 195.98 - samples/sec: 328.84 - lr: 0.000077 - momentum: 0.000000 2023-10-13 03:33:00,849 epoch 6 - iter 495/992 - loss 0.03214346 - time (sec): 246.17 - samples/sec: 332.03 - lr: 0.000075 - momentum: 0.000000 2023-10-13 03:33:50,905 epoch 6 - iter 594/992 - loss 0.03248643 - time (sec): 296.23 - samples/sec: 332.81 - lr: 0.000073 - momentum: 0.000000 2023-10-13 03:34:40,007 epoch 6 - iter 693/992 - loss 0.03072732 - time (sec): 345.33 - samples/sec: 332.40 - lr: 0.000072 - momentum: 0.000000 2023-10-13 03:35:28,410 epoch 6 - iter 792/992 - loss 0.03159034 - time (sec): 393.73 - samples/sec: 330.85 - lr: 0.000070 - momentum: 0.000000 2023-10-13 03:36:17,973 epoch 6 - iter 891/992 - loss 0.03165963 - time (sec): 443.30 - samples/sec: 332.46 - lr: 0.000068 - momentum: 0.000000 2023-10-13 03:37:05,872 epoch 6 - iter 990/992 - loss 0.03176266 - time (sec): 491.20 - samples/sec: 333.06 - lr: 0.000067 - momentum: 0.000000 2023-10-13 03:37:06,914 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:37:06,914 EPOCH 6 done: loss 0.0317 - lr: 0.000067 2023-10-13 03:37:32,957 DEV : loss 0.14087608456611633 - f1-score (micro avg) 0.7729 2023-10-13 03:37:33,004 saving best model 2023-10-13 03:37:35,653 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:38:24,462 epoch 7 - iter 99/992 - loss 0.02005486 - time (sec): 48.80 - samples/sec: 320.57 - lr: 0.000065 - momentum: 0.000000 2023-10-13 03:39:13,790 epoch 7 - iter 198/992 - loss 0.02414684 - time (sec): 98.13 - samples/sec: 323.15 - lr: 0.000063 - momentum: 0.000000 2023-10-13 03:40:03,713 epoch 7 - iter 297/992 - loss 0.02430376 - time (sec): 148.05 - samples/sec: 326.28 - lr: 0.000062 - momentum: 0.000000 2023-10-13 03:40:53,130 epoch 7 - iter 396/992 - loss 0.02264990 - time (sec): 197.47 - samples/sec: 325.58 - lr: 0.000060 - momentum: 0.000000 2023-10-13 03:41:44,388 epoch 7 - iter 495/992 - loss 0.02246411 - time (sec): 248.73 - samples/sec: 323.74 - lr: 0.000058 - momentum: 0.000000 2023-10-13 03:42:35,852 epoch 7 - iter 594/992 - loss 0.02267295 - time (sec): 300.19 - samples/sec: 323.54 - lr: 0.000057 - momentum: 0.000000 2023-10-13 03:43:26,313 epoch 7 - iter 693/992 - loss 0.02315883 - time (sec): 350.65 - samples/sec: 325.05 - lr: 0.000055 - momentum: 0.000000 2023-10-13 03:44:16,277 epoch 7 - iter 792/992 - loss 0.02203811 - time (sec): 400.62 - samples/sec: 324.90 - lr: 0.000053 - momentum: 0.000000 2023-10-13 03:45:05,985 epoch 7 - iter 891/992 - loss 0.02206609 - time (sec): 450.33 - samples/sec: 324.35 - lr: 0.000052 - momentum: 0.000000 2023-10-13 03:45:56,424 epoch 7 - iter 990/992 - loss 0.02311740 - time (sec): 500.77 - samples/sec: 326.87 - lr: 0.000050 - momentum: 0.000000 2023-10-13 03:45:57,398 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:45:57,398 EPOCH 7 done: loss 0.0231 - lr: 0.000050 2023-10-13 03:46:23,125 DEV : loss 0.16646943986415863 - f1-score (micro avg) 0.7657 2023-10-13 03:46:23,168 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:47:12,498 epoch 8 - iter 99/992 - loss 0.01274592 - time (sec): 49.33 - samples/sec: 322.37 - lr: 0.000048 - momentum: 0.000000 2023-10-13 03:48:02,147 epoch 8 - iter 198/992 - loss 0.01456983 - time (sec): 98.98 - samples/sec: 323.59 - lr: 0.000047 - momentum: 0.000000 2023-10-13 03:48:52,185 epoch 8 - iter 297/992 - loss 0.01557130 - time (sec): 149.02 - samples/sec: 318.69 - lr: 0.000045 - momentum: 0.000000 2023-10-13 03:49:42,890 epoch 8 - iter 396/992 - loss 0.01688137 - time (sec): 199.72 - samples/sec: 320.20 - lr: 0.000043 - momentum: 0.000000 2023-10-13 03:50:32,423 epoch 8 - iter 495/992 - loss 0.01694736 - time (sec): 249.25 - samples/sec: 323.61 - lr: 0.000042 - momentum: 0.000000 2023-10-13 03:51:22,427 epoch 8 - iter 594/992 - loss 0.01706011 - time (sec): 299.26 - samples/sec: 325.69 - lr: 0.000040 - momentum: 0.000000 2023-10-13 03:52:11,889 epoch 8 - iter 693/992 - loss 0.01659707 - time (sec): 348.72 - samples/sec: 325.13 - lr: 0.000038 - momentum: 0.000000 2023-10-13 03:53:02,369 epoch 8 - iter 792/992 - loss 0.01649795 - time (sec): 399.20 - samples/sec: 325.98 - lr: 0.000037 - momentum: 0.000000 2023-10-13 03:53:52,979 epoch 8 - iter 891/992 - loss 0.01717681 - time (sec): 449.81 - samples/sec: 328.07 - lr: 0.000035 - momentum: 0.000000 2023-10-13 03:54:44,008 epoch 8 - iter 990/992 - loss 0.01826285 - time (sec): 500.84 - samples/sec: 326.67 - lr: 0.000033 - momentum: 0.000000 2023-10-13 03:54:45,073 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:54:45,074 EPOCH 8 done: loss 0.0183 - lr: 0.000033 2023-10-13 03:55:12,865 DEV : loss 0.18594855070114136 - f1-score (micro avg) 0.7538 2023-10-13 03:55:12,912 ---------------------------------------------------------------------------------------------------- 2023-10-13 03:56:04,649 epoch 9 - iter 99/992 - loss 0.01296281 - time (sec): 51.73 - samples/sec: 322.21 - lr: 0.000032 - momentum: 0.000000 2023-10-13 03:56:56,423 epoch 9 - iter 198/992 - loss 0.01458632 - time (sec): 103.51 - samples/sec: 328.37 - lr: 0.000030 - momentum: 0.000000 2023-10-13 03:57:46,418 epoch 9 - iter 297/992 - loss 0.01453909 - time (sec): 153.50 - samples/sec: 326.97 - lr: 0.000028 - momentum: 0.000000 2023-10-13 03:58:35,117 epoch 9 - iter 396/992 - loss 0.01454652 - time (sec): 202.20 - samples/sec: 328.27 - lr: 0.000027 - momentum: 0.000000 2023-10-13 03:59:24,055 epoch 9 - iter 495/992 - loss 0.01423676 - time (sec): 251.14 - samples/sec: 329.00 - lr: 0.000025 - momentum: 0.000000 2023-10-13 04:00:12,824 epoch 9 - iter 594/992 - loss 0.01466894 - time (sec): 299.91 - samples/sec: 324.77 - lr: 0.000023 - momentum: 0.000000 2023-10-13 04:01:01,329 epoch 9 - iter 693/992 - loss 0.01502494 - time (sec): 348.41 - samples/sec: 324.83 - lr: 0.000022 - momentum: 0.000000 2023-10-13 04:01:52,120 epoch 9 - iter 792/992 - loss 0.01411559 - time (sec): 399.20 - samples/sec: 324.82 - lr: 0.000020 - momentum: 0.000000 2023-10-13 04:02:42,269 epoch 9 - iter 891/992 - loss 0.01477597 - time (sec): 449.35 - samples/sec: 326.00 - lr: 0.000018 - momentum: 0.000000 2023-10-13 04:03:31,215 epoch 9 - iter 990/992 - loss 0.01445271 - time (sec): 498.30 - samples/sec: 328.51 - lr: 0.000017 - momentum: 0.000000 2023-10-13 04:03:32,142 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:03:32,142 EPOCH 9 done: loss 0.0145 - lr: 0.000017 2023-10-13 04:03:58,803 DEV : loss 0.1992286741733551 - f1-score (micro avg) 0.7549 2023-10-13 04:03:58,852 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:04:49,810 epoch 10 - iter 99/992 - loss 0.01291517 - time (sec): 50.96 - samples/sec: 323.89 - lr: 0.000015 - momentum: 0.000000 2023-10-13 04:05:42,949 epoch 10 - iter 198/992 - loss 0.01207249 - time (sec): 104.09 - samples/sec: 317.10 - lr: 0.000013 - momentum: 0.000000 2023-10-13 04:06:35,842 epoch 10 - iter 297/992 - loss 0.01487264 - time (sec): 156.99 - samples/sec: 314.24 - lr: 0.000012 - momentum: 0.000000 2023-10-13 04:07:25,908 epoch 10 - iter 396/992 - loss 0.01345843 - time (sec): 207.05 - samples/sec: 320.80 - lr: 0.000010 - momentum: 0.000000 2023-10-13 04:08:15,681 epoch 10 - iter 495/992 - loss 0.01306294 - time (sec): 256.83 - samples/sec: 320.88 - lr: 0.000008 - momentum: 0.000000 2023-10-13 04:09:07,518 epoch 10 - iter 594/992 - loss 0.01311770 - time (sec): 308.66 - samples/sec: 319.00 - lr: 0.000007 - momentum: 0.000000 2023-10-13 04:09:59,491 epoch 10 - iter 693/992 - loss 0.01222441 - time (sec): 360.64 - samples/sec: 320.92 - lr: 0.000005 - momentum: 0.000000 2023-10-13 04:10:50,346 epoch 10 - iter 792/992 - loss 0.01162859 - time (sec): 411.49 - samples/sec: 321.78 - lr: 0.000004 - momentum: 0.000000 2023-10-13 04:11:38,694 epoch 10 - iter 891/992 - loss 0.01195067 - time (sec): 459.84 - samples/sec: 321.40 - lr: 0.000002 - momentum: 0.000000 2023-10-13 04:12:27,983 epoch 10 - iter 990/992 - loss 0.01151476 - time (sec): 509.13 - samples/sec: 321.37 - lr: 0.000000 - momentum: 0.000000 2023-10-13 04:12:29,000 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:12:29,000 EPOCH 10 done: loss 0.0116 - lr: 0.000000 2023-10-13 04:12:54,069 DEV : loss 0.20610998570919037 - f1-score (micro avg) 0.7583 2023-10-13 04:12:55,050 ---------------------------------------------------------------------------------------------------- 2023-10-13 04:12:55,052 Loading model from best epoch ... 2023-10-13 04:12:59,365 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-13 04:13:24,616 Results: - F-score (micro) 0.7663 - F-score (macro) 0.6747 - Accuracy 0.6446 By class: precision recall f1-score support LOC 0.8142 0.8427 0.8282 655 PER 0.6932 0.8206 0.7515 223 ORG 0.4860 0.4094 0.4444 127 micro avg 0.7502 0.7831 0.7663 1005 macro avg 0.6644 0.6909 0.6747 1005 weighted avg 0.7458 0.7831 0.7627 1005 2023-10-13 04:13:24,617 ----------------------------------------------------------------------------------------------------