diff --git "a/wandb/run-20220306_154329-3378nr4g/files/output.log" "b/wandb/run-20220306_154329-3378nr4g/files/output.log" new file mode 100644--- /dev/null +++ "b/wandb/run-20220306_154329-3378nr4g/files/output.log" @@ -0,0 +1,1698 @@ + + + 0%| | 0/5080 [00:00> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:43:37,253 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:43:40,274 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:43:43,409 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:43:46,480 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:43:49,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:43:52,675 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:43:55,746 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 1/5080 [00:25<36:12:38, 25.67s/it] + + 0%| | 1/5080 [00:25<36:12:38, 25.67s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:43:59,072 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:02,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:05,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:08,171 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:11,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:14,099 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:17,091 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 10.4912, 'learning_rate': 6.000000000000001e-08, 'epoch': 0.01} +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:20,124 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + + 0%| | 2/5080 [00:49<35:00:36, 24.82s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:44:23,194 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:26,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:29,063 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:31,977 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:34,941 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:37,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:40,870 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:44,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 3/5080 [01:13<34:25:47, 24.41s/it] + + 0%| | 3/5080 [01:13<34:25:47, 24.41s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:44:47,151 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:50,017 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:53,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:55,994 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:44:58,969 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:01,911 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:04,866 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:07,846 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 4/5080 [01:37<34:04:46, 24.17s/it] + + 0%| | 4/5080 [01:37<34:04:46, 24.17s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:45:10,851 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:13,730 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:16,592 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:19,607 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:22,440 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:25,281 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:28,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 10.5009, 'learning_rate': 1.5000000000000002e-07, 'epoch': 0.02} +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:30,979 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + + 0%| | 5/5080 [02:00<33:32:42, 23.80s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:45:33,932 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:36,774 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:39,641 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:42,498 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:45,270 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:48,146 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:51,010 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 10.4784, 'learning_rate': 1.8e-07, 'epoch': 0.02} +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:53,820 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + + 0%| | 6/5080 [02:23<33:04:51, 23.47s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:45:56,770 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:45:59,648 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:02,494 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:05,248 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:08,090 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:10,946 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:13,777 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:16,574 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 7/5080 [02:46<32:44:39, 23.24s/it] + + 0%| | 7/5080 [02:46<32:44:39, 23.24s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:46:19,533 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:22,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:25,191 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:27,943 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:30,749 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:33,597 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:36,340 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:39,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 8/5080 [03:08<32:25:29, 23.01s/it] + + 0%| | 8/5080 [03:08<32:25:29, 23.01s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:46:42,036 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:44,848 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:47,549 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:50,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:53,042 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:55,855 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:46:58,630 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:01,383 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 9/5080 [03:31<32:05:28, 22.78s/it] + + 0%| | 9/5080 [03:31<32:05:28, 22.78s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:47:04,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:07,019 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:09,737 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:12,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:15,293 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:17,989 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:20,684 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:23,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%| | 10/5080 [03:53<31:46:47, 22.57s/it] + + 0%| | 10/5080 [03:53<31:46:47, 22.57s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:47:26,328 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:28,982 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:31,647 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:34,373 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:37,038 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:39,704 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:42,403 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 10.4186, 'learning_rate': 3.2999999999999996e-07, 'epoch': 0.04} +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:45,149 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + + 0%|▏ | 11/5080 [04:14<31:23:38, 22.30s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:47:47,942 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:50,642 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:53,357 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:56,061 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:47:58,698 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:01,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:04,119 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 10.3834, 'learning_rate': 3.6e-07, 'epoch': 0.05} +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:06,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + + 0%|▏ | 12/5080 [04:36<31:07:19, 22.11s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:48:09,583 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:12,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:14,988 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:18,211 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:20,861 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:23,540 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:26,221 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 10.3448, 'learning_rate': 3.8999999999999997e-07, 'epoch': 0.05} +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:28,910 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 13/5080 [04:58<31:06:23, 22.10s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:48:31,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 13/5080 [04:58<31:06:23, 22.10s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:48:31,779 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:37,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:37,198 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:42,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:42,436 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:47,721 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 14/5080 [05:20<30:48:11, 21.89s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 14/5080 [05:20<30:48:11, 21.89s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 14/5080 [05:20<30:48:11, 21.89s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:48:53,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 14/5080 [05:20<30:48:11, 21.89s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:48:53,003 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:58,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:48:58,291 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:49:03,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:49:03,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:49:08,817 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 15/5080 [05:41<30:28:27, 21.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 15/5080 [05:41<30:28:27, 21.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 15/5080 [05:41<30:28:27, 21.66s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:49:14,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 15/5080 [05:41<30:28:27, 21.66s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:49:14,234 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:49:19,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:49:19,411 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:49:24,697 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:49:24,697 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:49:29,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:49:29,845 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 16/5080 [06:02<30:13:25, 21.49s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 16/5080 [06:02<30:13:25, 21.49s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:49:35,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 16/5080 [06:02<30:13:25, 21.49s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:49:35,252 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:49:40,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:49:40,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:49:45,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:49:45,782 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:49:51,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 17/5080 [06:23<30:05:33, 21.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 17/5080 [06:23<30:05:33, 21.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 17/5080 [06:23<30:05:33, 21.40s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:49:56,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 17/5080 [06:23<30:05:33, 21.40s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:49:56,406 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:01,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:01,576 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:06,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:06,728 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:11,965 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 18/5080 [06:44<29:50:50, 21.23s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 18/5080 [06:44<29:50:50, 21.23s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 18/5080 [06:44<29:50:50, 21.23s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:50:17,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 18/5080 [06:44<29:50:50, 21.23s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:50:17,223 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:22,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:22,407 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:27,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:27,512 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:32,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:32,601 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 19/5080 [07:04<29:35:08, 21.05s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 19/5080 [07:04<29:35:08, 21.05s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:50:37,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 19/5080 [07:04<29:35:08, 21.05s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:50:37,868 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:42,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:42,959 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:48,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:48,060 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:53,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:50:53,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 20/5080 [07:25<29:23:03, 20.91s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 20/5080 [07:25<29:23:03, 20.91s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:50:58,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 20/5080 [07:25<29:23:03, 20.91s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:50:58,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:03,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:03,455 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:08,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:08,532 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:13,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:13,652 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 21/5080 [07:45<29:08:30, 20.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 21/5080 [07:45<29:08:30, 20.74s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:51:18,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▏ | 21/5080 [07:45<29:08:30, 20.74s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:51:18,769 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:23,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:23,788 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:28,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:28,878 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:33,902 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 22/5080 [08:06<28:57:43, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 22/5080 [08:06<28:57:43, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 22/5080 [08:06<28:57:43, 20.61s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:51:39,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 22/5080 [08:06<28:57:43, 20.61s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:51:39,011 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:44,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:44,145 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:49,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:49,075 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:54,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:51:54,085 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 23/5080 [08:26<28:46:02, 20.48s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 23/5080 [08:26<28:46:02, 20.48s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:51:59,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 23/5080 [08:26<28:46:02, 20.48s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:51:59,247 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:52:04,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:52:04,212 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:52:09,215 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:52:09,215 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:52:14,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:52:14,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:52:14,192 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 24/5080 [08:46<28:37:11, 20.38s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 24/5080 [08:46<28:37:11, 20.38s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:19,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 24/5080 [08:46<28:37:11, 20.38s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:19,246 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:52:24,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:52:24,179 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:52:29,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:52:29,116 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:52:33,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 25/5080 [09:06<28:32:50, 20.33s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 25/5080 [09:06<28:32:50, 20.33s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 25/5080 [09:06<28:32:50, 20.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 25/5080 [09:06<28:32:50, 20.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 25/5080 [09:06<28:32:50, 20.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 25/5080 [09:06<28:32:50, 20.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 25/5080 [09:06<28:32:50, 20.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 25/5080 [09:06<28:32:50, 20.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 0%|▎ | 25/5080 [09:06<28:32:50, 20.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 26/5080 [09:26<28:17:48, 20.16s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 26/5080 [09:26<28:17:48, 20.16s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 10.0031, 'learning_rate': 7.799999999999999e-07, 'epoch': 0.1} + 1%|▎ | 26/5080 [09:26<28:17:48, 20.16s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 26/5080 [09:26<28:17:48, 20.16s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 26/5080 [09:26<28:17:48, 20.16s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 26/5080 [09:26<28:17:48, 20.16s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 26/5080 [09:26<28:17:48, 20.16s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 26/5080 [09:26<28:17:48, 20.16s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 27/5080 [09:45<27:56:44, 19.91s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 27/5080 [09:45<27:56:44, 19.91s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.9708, 'learning_rate': 8.1e-07, 'epoch': 0.11} + 1%|▎ | 27/5080 [09:45<27:56:44, 19.91s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 27/5080 [09:45<27:56:44, 19.91s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 27/5080 [09:45<27:56:44, 19.91s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 27/5080 [09:45<27:56:44, 19.91s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 27/5080 [09:45<27:56:44, 19.91s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 27/5080 [09:45<27:56:44, 19.91s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 27/5080 [09:45<27:56:44, 19.91s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 28/5080 [10:05<27:41:47, 19.74s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 28/5080 [10:05<27:41:47, 19.74s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 28/5080 [10:05<27:41:47, 19.74s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 28/5080 [10:05<27:41:47, 19.74s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 28/5080 [10:05<27:41:47, 19.74s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 28/5080 [10:05<27:41:47, 19.74s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 28/5080 [10:05<27:41:47, 19.74s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 28/5080 [10:05<27:41:47, 19.74s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 29/5080 [10:24<27:24:48, 19.54s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 29/5080 [10:24<27:24:48, 19.54s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.9293, 'learning_rate': 8.7e-07, 'epoch': 0.11} + 1%|▎ | 29/5080 [10:24<27:24:48, 19.54s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 29/5080 [10:24<27:24:48, 19.54s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 29/5080 [10:24<27:24:48, 19.54s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 29/5080 [10:24<27:24:48, 19.54s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 29/5080 [10:24<27:24:48, 19.54s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 29/5080 [10:24<27:24:48, 19.54s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 30/5080 [10:43<27:06:58, 19.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 30/5080 [10:43<27:06:58, 19.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.9008, 'learning_rate': 9e-07, 'epoch': 0.12} + 1%|▎ | 30/5080 [10:43<27:06:58, 19.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 30/5080 [10:43<27:06:58, 19.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 30/5080 [10:43<27:06:58, 19.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 30/5080 [10:43<27:06:58, 19.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 30/5080 [10:43<27:06:58, 19.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 30/5080 [10:43<27:06:58, 19.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 31/5080 [11:01<26:50:53, 19.14s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 31/5080 [11:01<26:50:53, 19.14s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.8413, 'learning_rate': 9.3e-07, 'epoch': 0.12} + 1%|▎ | 31/5080 [11:01<26:50:53, 19.14s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 31/5080 [11:01<26:50:53, 19.14s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 31/5080 [11:01<26:50:53, 19.14s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 31/5080 [11:01<26:50:53, 19.14s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 31/5080 [11:01<26:50:53, 19.14s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 31/5080 [11:01<26:50:53, 19.14s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 32/5080 [11:20<26:35:21, 18.96s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 32/5080 [11:20<26:35:21, 18.96s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.8183, 'learning_rate': 9.600000000000001e-07, 'epoch': 0.13} + 1%|▎ | 32/5080 [11:20<26:35:21, 18.96s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 32/5080 [11:20<26:35:21, 18.96s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 32/5080 [11:20<26:35:21, 18.96s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 32/5080 [11:20<26:35:21, 18.96s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 32/5080 [11:20<26:35:21, 18.96s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▎ | 32/5080 [11:20<26:35:21, 18.96s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 33/5080 [11:38<26:25:29, 18.85s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 33/5080 [11:38<26:25:29, 18.85s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.8165, 'learning_rate': 9.9e-07, 'epoch': 0.13} + 1%|▍ | 33/5080 [11:38<26:25:29, 18.85s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 33/5080 [11:38<26:25:29, 18.85s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 33/5080 [11:38<26:25:29, 18.85s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 33/5080 [11:38<26:25:29, 18.85s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 33/5080 [11:38<26:25:29, 18.85s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 33/5080 [11:38<26:25:29, 18.85s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 34/5080 [11:56<26:04:55, 18.61s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 34/5080 [11:56<26:04:55, 18.61s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.7907, 'learning_rate': 1.0200000000000002e-06, 'epoch': 0.13} + 1%|▍ | 34/5080 [11:56<26:04:55, 18.61s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 34/5080 [11:56<26:04:55, 18.61s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 34/5080 [11:56<26:04:55, 18.61s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 34/5080 [11:56<26:04:55, 18.61s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:52:39,543 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:55:42,697 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:55:42,697 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 35/5080 [12:14<25:41:41, 18.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 35/5080 [12:14<25:41:41, 18.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 35/5080 [12:14<25:41:41, 18.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 35/5080 [12:14<25:41:41, 18.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 35/5080 [12:14<25:41:41, 18.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 35/5080 [12:14<25:41:41, 18.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 35/5080 [12:14<25:41:41, 18.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 35/5080 [12:14<25:41:41, 18.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 36/5080 [12:32<25:19:32, 18.08s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 36/5080 [12:32<25:19:32, 18.08s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.7398, 'learning_rate': 1.08e-06, 'epoch': 0.14} + 1%|▍ | 36/5080 [12:32<25:19:32, 18.08s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 36/5080 [12:32<25:19:32, 18.08s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 36/5080 [12:32<25:19:32, 18.08s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 36/5080 [12:32<25:19:32, 18.08s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 36/5080 [12:32<25:19:32, 18.08s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 36/5080 [12:32<25:19:32, 18.08s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 37/5080 [12:49<24:54:40, 17.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 37/5080 [12:49<24:54:40, 17.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.723, 'learning_rate': 1.11e-06, 'epoch': 0.15} + 1%|▍ | 37/5080 [12:49<24:54:40, 17.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 37/5080 [12:49<24:54:40, 17.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 37/5080 [12:49<24:54:40, 17.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 37/5080 [12:49<24:54:40, 17.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 37/5080 [12:49<24:54:40, 17.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 37/5080 [12:49<24:54:40, 17.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 38/5080 [13:06<24:35:24, 17.56s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 38/5080 [13:06<24:35:24, 17.56s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.6753, 'learning_rate': 1.14e-06, 'epoch': 0.15} + 1%|▍ | 38/5080 [13:06<24:35:24, 17.56s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 38/5080 [13:06<24:35:24, 17.56s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 38/5080 [13:06<24:35:24, 17.56s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 38/5080 [13:06<24:35:24, 17.56s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 38/5080 [13:06<24:35:24, 17.56s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 38/5080 [13:06<24:35:24, 17.56s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 39/5080 [13:22<23:55:38, 17.09s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 39/5080 [13:22<23:55:38, 17.09s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.6564, 'learning_rate': 1.17e-06, 'epoch': 0.15} + 1%|▍ | 39/5080 [13:22<23:55:38, 17.09s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 39/5080 [13:22<23:55:38, 17.09s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 39/5080 [13:22<23:55:38, 17.09s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 39/5080 [13:22<23:55:38, 17.09s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 39/5080 [13:22<23:55:38, 17.09s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 39/5080 [13:22<23:55:38, 17.09s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 39/5080 [13:22<23:55:38, 17.09s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 40/5080 [13:37<23:15:38, 16.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 40/5080 [13:37<23:15:38, 16.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 40/5080 [13:37<23:15:38, 16.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 40/5080 [13:37<23:15:38, 16.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:57:17,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:57:17,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:57:17,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:57:17,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 41/5080 [13:53<22:59:53, 16.43s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 41/5080 [13:53<22:59:53, 16.43s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 41/5080 [13:53<22:59:53, 16.43s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 41/5080 [13:53<22:59:53, 16.43s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 41/5080 [13:53<22:59:53, 16.43s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 41/5080 [13:53<22:59:53, 16.43s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 41/5080 [13:53<22:59:53, 16.43s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 41/5080 [13:53<22:59:53, 16.43s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 41/5080 [13:53<22:59:53, 16.43s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 42/5080 [14:09<22:45:12, 16.26s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:57:41,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 42/5080 [14:09<22:45:12, 16.26s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:57:41,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 42/5080 [14:09<22:45:12, 16.26s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:57:41,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 42/5080 [14:09<22:45:12, 16.26s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:57:41,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 42/5080 [14:09<22:45:12, 16.26s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:57:41,547 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:57:51,818 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 43/5080 [14:23<21:36:43, 15.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▍ | 43/5080 [14:23<21:36:43, 15.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.5308, 'learning_rate': 1.29e-06, 'epoch': 0.17} +[WARNING|modeling_utils.py:388] 2022-03-06 15:57:58,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:57:58,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:57:58,147 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:03,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:03,828 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 44/5080 [14:35<20:09:52, 14.41s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:08,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:08,466 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:12,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:12,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:16,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:16,237 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:18,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:18,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:18,758 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:24,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:26,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:26,359 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.4824, 'learning_rate': 1.38e-06, 'epoch': 0.18} +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:29,685 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:31,765 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:33,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:33,711 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 47/5080 [15:04<15:36:41, 11.17s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:58:35,747 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:37,595 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:39,405 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:41,499 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:41,499 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 48/5080 [15:12<14:17:44, 10.23s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:58:43,925 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:45,579 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:48,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:48,657 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 49/5080 [15:19<12:47:56, 9.16s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:58:50,213 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:52,891 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:54,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:58:54,103 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 50/5080 [15:25<11:24:42, 8.17s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 50/5080 [15:25<11:24:42, 8.17s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:58:58,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 50/5080 [15:25<11:24:42, 8.17s/it][WARNING|modeling_utils.py:388] 2022-03-06 15:58:58,510 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:59:04,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:59:04,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:59:04,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:59:04,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 15:59:04,514 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 51/5080 [15:49<18:03:02, 12.92s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 51/5080 [15:49<18:03:02, 12.92s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.5391, 'learning_rate': 1.53e-06, 'epoch': 0.2} + 1%|▌ | 51/5080 [15:49<18:03:02, 12.92s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 51/5080 [15:49<18:03:02, 12.92s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 51/5080 [15:49<18:03:02, 12.92s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 51/5080 [15:49<18:03:02, 12.92s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 51/5080 [15:49<18:03:02, 12.92s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 51/5080 [15:49<18:03:02, 12.92s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 51/5080 [15:49<18:03:02, 12.92s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 52/5080 [16:12<22:21:20, 16.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 52/5080 [16:12<22:21:20, 16.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 52/5080 [16:12<22:21:20, 16.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 52/5080 [16:12<22:21:20, 16.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 52/5080 [16:12<22:21:20, 16.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 52/5080 [16:12<22:21:20, 16.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 52/5080 [16:12<22:21:20, 16.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 52/5080 [16:12<22:21:20, 16.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 53/5080 [16:35<25:13:48, 18.07s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 53/5080 [16:35<25:13:48, 18.07s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.4611, 'learning_rate': 1.59e-06, 'epoch': 0.21} + 1%|▌ | 53/5080 [16:35<25:13:48, 18.07s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 53/5080 [16:35<25:13:48, 18.07s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 53/5080 [16:35<25:13:48, 18.07s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 53/5080 [16:35<25:13:48, 18.07s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 53/5080 [16:35<25:13:48, 18.07s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 53/5080 [16:35<25:13:48, 18.07s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 54/5080 [16:57<27:08:14, 19.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 54/5080 [16:57<27:08:14, 19.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.4358, 'learning_rate': 1.62e-06, 'epoch': 0.21} + 1%|▌ | 54/5080 [16:57<27:08:14, 19.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 54/5080 [16:57<27:08:14, 19.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 54/5080 [16:57<27:08:14, 19.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 54/5080 [16:57<27:08:14, 19.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 54/5080 [16:57<27:08:14, 19.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▌ | 54/5080 [16:57<27:08:14, 19.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 55/5080 [17:20<28:23:44, 20.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 55/5080 [17:20<28:23:44, 20.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.3875, 'learning_rate': 1.65e-06, 'epoch': 0.22} + 1%|▋ | 55/5080 [17:20<28:23:44, 20.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 55/5080 [17:20<28:23:44, 20.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 55/5080 [17:20<28:23:44, 20.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 55/5080 [17:20<28:23:44, 20.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 55/5080 [17:20<28:23:44, 20.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 55/5080 [17:20<28:23:44, 20.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 55/5080 [17:20<28:23:44, 20.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 56/5080 [17:42<29:15:19, 20.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 56/5080 [17:42<29:15:19, 20.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 56/5080 [17:42<29:15:19, 20.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 56/5080 [17:42<29:15:19, 20.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 56/5080 [17:42<29:15:19, 20.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 56/5080 [17:42<29:15:19, 20.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 56/5080 [17:42<29:15:19, 20.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 56/5080 [17:42<29:15:19, 20.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 56/5080 [17:42<29:15:19, 20.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 57/5080 [18:04<29:42:17, 21.29s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 57/5080 [18:04<29:42:17, 21.29s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 57/5080 [18:04<29:42:17, 21.29s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 57/5080 [18:04<29:42:17, 21.29s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 57/5080 [18:04<29:42:17, 21.29s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 57/5080 [18:04<29:42:17, 21.29s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 57/5080 [18:04<29:42:17, 21.29s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 57/5080 [18:04<29:42:17, 21.29s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 58/5080 [18:26<30:03:37, 21.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 58/5080 [18:26<30:03:37, 21.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.3994, 'learning_rate': 1.74e-06, 'epoch': 0.23} + 1%|▋ | 58/5080 [18:26<30:03:37, 21.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 58/5080 [18:26<30:03:37, 21.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 58/5080 [18:26<30:03:37, 21.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 58/5080 [18:26<30:03:37, 21.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 58/5080 [18:26<30:03:37, 21.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 58/5080 [18:26<30:03:37, 21.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 59/5080 [18:48<30:13:02, 21.67s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 59/5080 [18:48<30:13:02, 21.67s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.3871, 'learning_rate': 1.77e-06, 'epoch': 0.23} + 1%|▋ | 59/5080 [18:48<30:13:02, 21.67s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 59/5080 [18:48<30:13:02, 21.67s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 59/5080 [18:48<30:13:02, 21.67s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 59/5080 [18:48<30:13:02, 21.67s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 59/5080 [18:48<30:13:02, 21.67s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 59/5080 [18:48<30:13:02, 21.67s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 60/5080 [19:10<30:21:20, 21.77s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 60/5080 [19:10<30:21:20, 21.77s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.3273, 'learning_rate': 1.8e-06, 'epoch': 0.24} + 1%|▋ | 60/5080 [19:10<30:21:20, 21.77s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 60/5080 [19:10<30:21:20, 21.77s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 60/5080 [19:10<30:21:20, 21.77s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 60/5080 [19:10<30:21:20, 21.77s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 60/5080 [19:10<30:21:20, 21.77s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 60/5080 [19:10<30:21:20, 21.77s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 60/5080 [19:10<30:21:20, 21.77s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 61/5080 [19:32<30:19:16, 21.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 61/5080 [19:32<30:19:16, 21.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 61/5080 [19:32<30:19:16, 21.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 61/5080 [19:32<30:19:16, 21.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 61/5080 [19:32<30:19:16, 21.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 61/5080 [19:32<30:19:16, 21.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 61/5080 [19:32<30:19:16, 21.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 61/5080 [19:32<30:19:16, 21.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 61/5080 [19:32<30:19:16, 21.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 62/5080 [19:53<30:10:58, 21.65s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 62/5080 [19:53<30:10:58, 21.65s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 62/5080 [19:53<30:10:58, 21.65s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 62/5080 [19:53<30:10:58, 21.65s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 62/5080 [19:53<30:10:58, 21.65s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 62/5080 [19:53<30:10:58, 21.65s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 62/5080 [19:53<30:10:58, 21.65s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 62/5080 [19:53<30:10:58, 21.65s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 62/5080 [19:53<30:10:58, 21.65s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 63/5080 [20:16<30:21:29, 21.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 63/5080 [20:16<30:21:29, 21.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 63/5080 [20:16<30:21:29, 21.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 63/5080 [20:16<30:21:29, 21.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 63/5080 [20:16<30:21:29, 21.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 63/5080 [20:16<30:21:29, 21.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 63/5080 [20:16<30:21:29, 21.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 63/5080 [20:16<30:21:29, 21.78s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 64/5080 [20:37<30:14:41, 21.71s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 64/5080 [20:37<30:14:41, 21.71s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.3322, 'learning_rate': 1.9200000000000003e-06, 'epoch': 0.25} + 1%|▋ | 64/5080 [20:37<30:14:41, 21.71s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 64/5080 [20:37<30:14:41, 21.71s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 64/5080 [20:37<30:14:41, 21.71s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 64/5080 [20:37<30:14:41, 21.71s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 64/5080 [20:37<30:14:41, 21.71s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 64/5080 [20:37<30:14:41, 21.71s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 65/5080 [20:59<30:22:06, 21.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 65/5080 [20:59<30:22:06, 21.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.2935, 'learning_rate': 1.95e-06, 'epoch': 0.26} + 1%|▋ | 65/5080 [20:59<30:22:06, 21.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 65/5080 [20:59<30:22:06, 21.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 65/5080 [20:59<30:22:06, 21.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 65/5080 [20:59<30:22:06, 21.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 65/5080 [20:59<30:22:06, 21.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▋ | 65/5080 [20:59<30:22:06, 21.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 66/5080 [21:20<30:05:07, 21.60s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 66/5080 [21:20<30:05:07, 21.60s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.2578, 'learning_rate': 1.98e-06, 'epoch': 0.26} + 1%|▊ | 66/5080 [21:20<30:05:07, 21.60s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 66/5080 [21:20<30:05:07, 21.60s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 66/5080 [21:20<30:05:07, 21.60s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 66/5080 [21:20<30:05:07, 21.60s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 66/5080 [21:20<30:05:07, 21.60s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 66/5080 [21:20<30:05:07, 21.60s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.235, 'learning_rate': 2.0100000000000002e-06, 'epoch': 0.26} + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 68/5080 [22:02<29:30:33, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 68/5080 [22:02<29:30:33, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 68/5080 [22:02<29:30:33, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 68/5080 [22:02<29:30:33, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 68/5080 [22:02<29:30:33, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 68/5080 [22:02<29:30:33, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 68/5080 [22:02<29:30:33, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 68/5080 [22:02<29:30:33, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 68/5080 [22:02<29:30:33, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.1993, 'learning_rate': 2.1000000000000002e-06, 'epoch': 0.27} + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 69/5080 [22:22<29:15:01, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 71/5080 [23:03<28:40:35, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 71/5080 [23:03<28:40:35, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.208, 'learning_rate': 2.13e-06, 'epoch': 0.28} + 1%|▊ | 71/5080 [23:03<28:40:35, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 71/5080 [23:03<28:40:35, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 71/5080 [23:03<28:40:35, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 71/5080 [23:03<28:40:35, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 71/5080 [23:03<28:40:35, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 71/5080 [23:03<28:40:35, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 72/5080 [23:23<28:28:10, 20.47s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 72/5080 [23:23<28:28:10, 20.47s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.2079, 'learning_rate': 2.16e-06, 'epoch': 0.28} + 1%|▊ | 72/5080 [23:23<28:28:10, 20.47s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 72/5080 [23:23<28:28:10, 20.47s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 72/5080 [23:23<28:28:10, 20.47s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 72/5080 [23:23<28:28:10, 20.47s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 72/5080 [23:23<28:28:10, 20.47s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 72/5080 [23:23<28:28:10, 20.47s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 73/5080 [23:43<28:13:55, 20.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 73/5080 [23:43<28:13:55, 20.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.191, 'learning_rate': 2.1899999999999998e-06, 'epoch': 0.29} + 1%|▊ | 73/5080 [23:43<28:13:55, 20.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 73/5080 [23:43<28:13:55, 20.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 73/5080 [23:43<28:13:55, 20.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 73/5080 [23:43<28:13:55, 20.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 73/5080 [23:43<28:13:55, 20.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 73/5080 [23:43<28:13:55, 20.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 74/5080 [24:03<28:02:40, 20.17s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 74/5080 [24:03<28:02:40, 20.17s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.1381, 'learning_rate': 2.22e-06, 'epoch': 0.29} + 1%|▊ | 74/5080 [24:03<28:02:40, 20.17s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 74/5080 [24:03<28:02:40, 20.17s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 74/5080 [24:03<28:02:40, 20.17s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 74/5080 [24:03<28:02:40, 20.17s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 74/5080 [24:03<28:02:40, 20.17s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 74/5080 [24:03<28:02:40, 20.17s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 75/5080 [24:23<27:59:16, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 75/5080 [24:23<27:59:16, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.1718, 'learning_rate': 2.25e-06, 'epoch': 0.29} + 1%|▊ | 75/5080 [24:23<27:59:16, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 75/5080 [24:23<27:59:16, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 75/5080 [24:23<27:59:16, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 75/5080 [24:23<27:59:16, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 75/5080 [24:23<27:59:16, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 75/5080 [24:23<27:59:16, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 75/5080 [24:23<27:59:16, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 75/5080 [24:23<27:59:16, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 76/5080 [24:42<27:42:13, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 76/5080 [24:42<27:42:13, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 76/5080 [24:42<27:42:13, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 76/5080 [24:42<27:42:13, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 76/5080 [24:42<27:42:13, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 76/5080 [24:42<27:42:13, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 76/5080 [24:42<27:42:13, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 1%|▊ | 76/5080 [24:42<27:42:13, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 77/5080 [25:02<27:26:03, 19.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 77/5080 [25:02<27:26:03, 19.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.0874, 'learning_rate': 2.31e-06, 'epoch': 0.3} + 2%|▉ | 77/5080 [25:02<27:26:03, 19.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 77/5080 [25:02<27:26:03, 19.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 77/5080 [25:02<27:26:03, 19.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 77/5080 [25:02<27:26:03, 19.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 77/5080 [25:02<27:26:03, 19.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 77/5080 [25:02<27:26:03, 19.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 77/5080 [25:02<27:26:03, 19.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 78/5080 [25:21<27:08:36, 19.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 78/5080 [25:21<27:08:36, 19.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 78/5080 [25:21<27:08:36, 19.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 78/5080 [25:21<27:08:36, 19.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 78/5080 [25:21<27:08:36, 19.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 78/5080 [25:21<27:08:36, 19.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 78/5080 [25:21<27:08:36, 19.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 78/5080 [25:21<27:08:36, 19.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 78/5080 [25:21<27:08:36, 19.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 79/5080 [25:40<26:53:44, 19.36s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 79/5080 [25:40<26:53:44, 19.36s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 79/5080 [25:40<26:53:44, 19.36s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 79/5080 [25:40<26:53:44, 19.36s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 79/5080 [25:40<26:53:44, 19.36s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 79/5080 [25:40<26:53:44, 19.36s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 79/5080 [25:40<26:53:44, 19.36s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 79/5080 [25:40<26:53:44, 19.36s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 80/5080 [25:58<26:35:17, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 80/5080 [25:58<26:35:17, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.1158, 'learning_rate': 2.4000000000000003e-06, 'epoch': 0.31} + 2%|▉ | 80/5080 [25:58<26:35:17, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 80/5080 [25:58<26:35:17, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 80/5080 [25:58<26:35:17, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 80/5080 [25:58<26:35:17, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 80/5080 [25:58<26:35:17, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 80/5080 [25:58<26:35:17, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 81/5080 [26:17<26:20:01, 18.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 81/5080 [26:17<26:20:01, 18.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.0468, 'learning_rate': 2.43e-06, 'epoch': 0.32} + 2%|▉ | 81/5080 [26:17<26:20:01, 18.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 81/5080 [26:17<26:20:01, 18.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 81/5080 [26:17<26:20:01, 18.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 81/5080 [26:17<26:20:01, 18.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 81/5080 [26:17<26:20:01, 18.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 81/5080 [26:17<26:20:01, 18.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 81/5080 [26:17<26:20:01, 18.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 82/5080 [26:35<26:00:33, 18.73s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 82/5080 [26:35<26:00:33, 18.73s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 82/5080 [26:35<26:00:33, 18.73s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 82/5080 [26:35<26:00:33, 18.73s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 82/5080 [26:35<26:00:33, 18.73s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 82/5080 [26:35<26:00:33, 18.73s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 82/5080 [26:35<26:00:33, 18.73s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 82/5080 [26:35<26:00:33, 18.73s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 83/5080 [26:53<25:45:10, 18.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 83/5080 [26:53<25:45:10, 18.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.0746, 'learning_rate': 2.4900000000000003e-06, 'epoch': 0.33} + 2%|▉ | 83/5080 [26:53<25:45:10, 18.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 83/5080 [26:53<25:45:10, 18.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 83/5080 [26:53<25:45:10, 18.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 83/5080 [26:53<25:45:10, 18.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 83/5080 [26:53<25:45:10, 18.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 83/5080 [26:53<25:45:10, 18.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 83/5080 [26:53<25:45:10, 18.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 84/5080 [27:11<25:23:27, 18.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 84/5080 [27:11<25:23:27, 18.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 84/5080 [27:11<25:23:27, 18.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 84/5080 [27:11<25:23:27, 18.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 84/5080 [27:11<25:23:27, 18.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 84/5080 [27:11<25:23:27, 18.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 84/5080 [27:11<25:23:27, 18.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 84/5080 [27:11<25:23:27, 18.30s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.052, 'learning_rate': 2.55e-06, 'epoch': 0.33} +[WARNING|modeling_utils.py:388] 2022-03-06 16:11:03,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:11:03,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:11:03,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:11:03,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:11:03,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:11:03,140 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 86/5080 [27:45<24:31:32, 17.68s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 86/5080 [27:45<24:31:32, 17.68s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.9941, 'learning_rate': 2.58e-06, 'epoch': 0.34} + 2%|▉ | 86/5080 [27:45<24:31:32, 17.68s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 86/5080 [27:45<24:31:32, 17.68s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 86/5080 [27:45<24:31:32, 17.68s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 86/5080 [27:45<24:31:32, 17.68s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 86/5080 [27:45<24:31:32, 17.68s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 86/5080 [27:45<24:31:32, 17.68s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 86/5080 [27:45<24:31:32, 17.68s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 87/5080 [28:02<24:01:15, 17.32s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 87/5080 [28:02<24:01:15, 17.32s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 87/5080 [28:02<24:01:15, 17.32s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 87/5080 [28:02<24:01:15, 17.32s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 87/5080 [28:02<24:01:15, 17.32s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 87/5080 [28:02<24:01:15, 17.32s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 87/5080 [28:02<24:01:15, 17.32s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|▉ | 87/5080 [28:02<24:01:15, 17.32s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 88/5080 [28:18<23:45:54, 17.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 88/5080 [28:18<23:45:54, 17.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.0679, 'learning_rate': 2.6399999999999997e-06, 'epoch': 0.35} + 2%|█ | 88/5080 [28:18<23:45:54, 17.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 88/5080 [28:18<23:45:54, 17.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 88/5080 [28:18<23:45:54, 17.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 88/5080 [28:18<23:45:54, 17.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 88/5080 [28:18<23:45:54, 17.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 88/5080 [28:18<23:45:54, 17.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 88/5080 [28:18<23:45:54, 17.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 89/5080 [28:34<23:03:24, 16.63s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 89/5080 [28:34<23:03:24, 16.63s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 89/5080 [28:34<23:03:24, 16.63s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:11,905 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:11,905 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:17,447 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 90/5080 [28:49<22:19:14, 16.10s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 90/5080 [28:49<22:19:14, 16.10s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.0641, 'learning_rate': 2.7e-06, 'epoch': 0.35} + 2%|█ | 90/5080 [28:49<22:19:14, 16.10s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 90/5080 [28:49<22:19:14, 16.10s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 90/5080 [28:49<22:19:14, 16.10s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:29,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:29,813 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 91/5080 [29:02<21:23:34, 15.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 91/5080 [29:02<21:23:34, 15.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.0049, 'learning_rate': 2.73e-06, 'epoch': 0.36} +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:38,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:38,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:38,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:38,168 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:46,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:46,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.0, 'learning_rate': 2.76e-06, 'epoch': 0.36} +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:46,093 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:52,306 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:52,306 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:52,306 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:58,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:12:58,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.9513, 'learning_rate': 2.79e-06, 'epoch': 0.36} +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:02,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:02,521 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:06,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:06,620 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 94/5080 [29:39<18:04:40, 13.05s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:13:10,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 94/5080 [29:39<18:04:40, 13.05s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:13:10,619 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.8569, 'learning_rate': 2.82e-06, 'epoch': 0.37} +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:14,388 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:16,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:16,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:16,804 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 95/5080 [29:49<16:48:25, 12.14s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:13:20,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█ | 95/5080 [29:49<16:48:25, 12.14s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:13:20,485 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:23,928 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:26,109 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:28,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:28,276 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:30,473 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:32,454 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:34,349 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:36,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:36,202 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:38,143 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:39,903 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:41,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:41,588 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:44,888 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:46,381 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:47,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:47,849 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:50,707 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:52,035 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:54,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:54,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.5016, 'learning_rate': 3e-06, 'epoch': 0.39} +[WARNING|modeling_utils.py:388] 2022-03-06 16:13:54,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:14:01,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:14:01,240 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:14:07,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:14:07,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:14:07,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:14:07,181 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 101/5080 [30:48<17:09:35, 12.41s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 101/5080 [30:48<17:09:35, 12.41s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.0438, 'learning_rate': 3.0300000000000002e-06, 'epoch': 0.4} + 2%|█▏ | 101/5080 [30:48<17:09:35, 12.41s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 101/5080 [30:48<17:09:35, 12.41s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 101/5080 [30:48<17:09:35, 12.41s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 101/5080 [30:48<17:09:35, 12.41s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 101/5080 [30:48<17:09:35, 12.41s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 101/5080 [30:48<17:09:35, 12.41s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 101/5080 [30:48<17:09:35, 12.41s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 102/5080 [31:11<21:28:39, 15.53s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 102/5080 [31:11<21:28:39, 15.53s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 102/5080 [31:11<21:28:39, 15.53s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 102/5080 [31:11<21:28:39, 15.53s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 102/5080 [31:11<21:28:39, 15.53s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 102/5080 [31:11<21:28:39, 15.53s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 102/5080 [31:11<21:28:39, 15.53s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 102/5080 [31:11<21:28:39, 15.53s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 102/5080 [31:11<21:28:39, 15.53s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 103/5080 [31:33<24:20:56, 17.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 103/5080 [31:33<24:20:56, 17.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 103/5080 [31:33<24:20:56, 17.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 103/5080 [31:33<24:20:56, 17.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 103/5080 [31:33<24:20:56, 17.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 103/5080 [31:33<24:20:56, 17.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 103/5080 [31:33<24:20:56, 17.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 103/5080 [31:33<24:20:56, 17.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 103/5080 [31:33<24:20:56, 17.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 104/5080 [31:56<26:16:23, 19.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 104/5080 [31:56<26:16:23, 19.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 104/5080 [31:56<26:16:23, 19.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 104/5080 [31:56<26:16:23, 19.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 104/5080 [31:56<26:16:23, 19.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 104/5080 [31:56<26:16:23, 19.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 104/5080 [31:56<26:16:23, 19.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 104/5080 [31:56<26:16:23, 19.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 104/5080 [31:56<26:16:23, 19.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 105/5080 [32:18<27:35:11, 19.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 105/5080 [32:18<27:35:11, 19.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 105/5080 [32:18<27:35:11, 19.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 105/5080 [32:18<27:35:11, 19.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 105/5080 [32:18<27:35:11, 19.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 105/5080 [32:18<27:35:11, 19.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 105/5080 [32:18<27:35:11, 19.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 105/5080 [32:18<27:35:11, 19.96s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 106/5080 [32:40<28:28:53, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 106/5080 [32:40<28:28:53, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 9.0024, 'learning_rate': 3.18e-06, 'epoch': 0.42} + 2%|█▏ | 106/5080 [32:40<28:28:53, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 106/5080 [32:40<28:28:53, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 106/5080 [32:40<28:28:53, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 106/5080 [32:40<28:28:53, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 106/5080 [32:40<28:28:53, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 106/5080 [32:40<28:28:53, 20.61s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 107/5080 [33:02<29:01:19, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 107/5080 [33:02<29:01:19, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.9286, 'learning_rate': 3.21e-06, 'epoch': 0.42} + 2%|█▏ | 107/5080 [33:02<29:01:19, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 107/5080 [33:02<29:01:19, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 107/5080 [33:02<29:01:19, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 107/5080 [33:02<29:01:19, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 107/5080 [33:02<29:01:19, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 107/5080 [33:02<29:01:19, 21.01s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 108/5080 [33:24<29:26:01, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 108/5080 [33:24<29:26:01, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.9755, 'learning_rate': 3.24e-06, 'epoch': 0.42} + 2%|█▏ | 108/5080 [33:24<29:26:01, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 108/5080 [33:24<29:26:01, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 108/5080 [33:24<29:26:01, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 108/5080 [33:24<29:26:01, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 108/5080 [33:24<29:26:01, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 108/5080 [33:24<29:26:01, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 109/5080 [33:46<29:36:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 109/5080 [33:46<29:36:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.9247, 'learning_rate': 3.27e-06, 'epoch': 0.43} + 2%|█▏ | 109/5080 [33:46<29:36:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 109/5080 [33:46<29:36:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 109/5080 [33:46<29:36:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 109/5080 [33:46<29:36:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 109/5080 [33:46<29:36:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 109/5080 [33:46<29:36:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 109/5080 [33:46<29:36:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.8587, 'learning_rate': 3.3300000000000003e-06, 'epoch': 0.44} + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▏ | 110/5080 [34:07<29:42:54, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 112/5080 [34:50<29:39:02, 21.49s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 112/5080 [34:50<29:39:02, 21.49s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.9002, 'learning_rate': 3.36e-06, 'epoch': 0.44} + 2%|█▎ | 112/5080 [34:50<29:39:02, 21.49s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 112/5080 [34:50<29:39:02, 21.49s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 112/5080 [34:50<29:39:02, 21.49s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 112/5080 [34:50<29:39:02, 21.49s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 112/5080 [34:50<29:39:02, 21.49s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 112/5080 [34:50<29:39:02, 21.49s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 113/5080 [35:12<29:49:38, 21.62s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 113/5080 [35:12<29:49:38, 21.62s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.8349, 'learning_rate': 3.39e-06, 'epoch': 0.44} + 2%|█▎ | 113/5080 [35:12<29:49:38, 21.62s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 113/5080 [35:12<29:49:38, 21.62s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 113/5080 [35:12<29:49:38, 21.62s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 113/5080 [35:12<29:49:38, 21.62s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 113/5080 [35:12<29:49:38, 21.62s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 113/5080 [35:12<29:49:38, 21.62s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 113/5080 [35:12<29:49:38, 21.62s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 114/5080 [35:34<29:42:49, 21.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 114/5080 [35:34<29:42:49, 21.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.882, 'learning_rate': 3.4200000000000003e-06, 'epoch': 0.45} + 2%|█▎ | 114/5080 [35:34<29:42:49, 21.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 114/5080 [35:34<29:42:49, 21.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 114/5080 [35:34<29:42:49, 21.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 114/5080 [35:34<29:42:49, 21.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 114/5080 [35:34<29:42:49, 21.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 114/5080 [35:34<29:42:49, 21.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 114/5080 [35:34<29:42:49, 21.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 115/5080 [35:55<29:30:28, 21.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 115/5080 [35:55<29:30:28, 21.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 115/5080 [35:55<29:30:28, 21.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 115/5080 [35:55<29:30:28, 21.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 115/5080 [35:55<29:30:28, 21.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 115/5080 [35:55<29:30:28, 21.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 115/5080 [35:55<29:30:28, 21.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 115/5080 [35:55<29:30:28, 21.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 116/5080 [36:15<29:14:13, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 116/5080 [36:15<29:14:13, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.8294, 'learning_rate': 3.48e-06, 'epoch': 0.46} + 2%|█▎ | 116/5080 [36:15<29:14:13, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 116/5080 [36:15<29:14:13, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 116/5080 [36:15<29:14:13, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 116/5080 [36:15<29:14:13, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 116/5080 [36:15<29:14:13, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 116/5080 [36:15<29:14:13, 21.20s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 117/5080 [36:36<29:01:51, 21.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 117/5080 [36:36<29:01:51, 21.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.8029, 'learning_rate': 3.5100000000000003e-06, 'epoch': 0.46} + 2%|█▎ | 117/5080 [36:36<29:01:51, 21.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 117/5080 [36:36<29:01:51, 21.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 117/5080 [36:36<29:01:51, 21.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 117/5080 [36:36<29:01:51, 21.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 117/5080 [36:36<29:01:51, 21.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 117/5080 [36:36<29:01:51, 21.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 118/5080 [36:57<28:48:29, 20.90s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 118/5080 [36:57<28:48:29, 20.90s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.8243, 'learning_rate': 3.54e-06, 'epoch': 0.46} + 2%|█▎ | 118/5080 [36:57<28:48:29, 20.90s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 118/5080 [36:57<28:48:29, 20.90s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 118/5080 [36:57<28:48:29, 20.90s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 118/5080 [36:57<28:48:29, 20.90s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 118/5080 [36:57<28:48:29, 20.90s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 118/5080 [36:57<28:48:29, 20.90s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 119/5080 [37:17<28:39:11, 20.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 119/5080 [37:17<28:39:11, 20.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.8371, 'learning_rate': 3.57e-06, 'epoch': 0.47} + 2%|█▎ | 119/5080 [37:17<28:39:11, 20.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 119/5080 [37:17<28:39:11, 20.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 119/5080 [37:17<28:39:11, 20.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 119/5080 [37:17<28:39:11, 20.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 119/5080 [37:17<28:39:11, 20.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 119/5080 [37:17<28:39:11, 20.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 120/5080 [37:37<28:27:48, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 120/5080 [37:37<28:27:48, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.8668, 'learning_rate': 3.6e-06, 'epoch': 0.47} + 2%|█▎ | 120/5080 [37:37<28:27:48, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 120/5080 [37:37<28:27:48, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 120/5080 [37:37<28:27:48, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 120/5080 [37:37<28:27:48, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 120/5080 [37:37<28:27:48, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 120/5080 [37:37<28:27:48, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 121/5080 [37:58<28:16:18, 20.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 121/5080 [37:58<28:16:18, 20.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.8932, 'learning_rate': 3.63e-06, 'epoch': 0.47} + 2%|█▎ | 121/5080 [37:58<28:16:18, 20.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 121/5080 [37:58<28:16:18, 20.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 121/5080 [37:58<28:16:18, 20.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 121/5080 [37:58<28:16:18, 20.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 121/5080 [37:58<28:16:18, 20.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 121/5080 [37:58<28:16:18, 20.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 122/5080 [38:18<28:05:29, 20.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 122/5080 [38:18<28:05:29, 20.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.8174, 'learning_rate': 3.66e-06, 'epoch': 0.48} + 2%|█▎ | 122/5080 [38:18<28:05:29, 20.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 122/5080 [38:18<28:05:29, 20.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 122/5080 [38:18<28:05:29, 20.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 122/5080 [38:18<28:05:29, 20.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 122/5080 [38:18<28:05:29, 20.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 122/5080 [38:18<28:05:29, 20.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▎ | 122/5080 [38:18<28:05:29, 20.40s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 123/5080 [38:38<27:54:07, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 123/5080 [38:38<27:54:07, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 123/5080 [38:38<27:54:07, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 123/5080 [38:38<27:54:07, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 123/5080 [38:38<27:54:07, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 123/5080 [38:38<27:54:07, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 123/5080 [38:38<27:54:07, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 123/5080 [38:38<27:54:07, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 124/5080 [38:58<27:42:40, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 124/5080 [38:58<27:42:40, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.7989, 'learning_rate': 3.72e-06, 'epoch': 0.49} + 2%|█▍ | 124/5080 [38:58<27:42:40, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 124/5080 [38:58<27:42:40, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 124/5080 [38:58<27:42:40, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 124/5080 [38:58<27:42:40, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 124/5080 [38:58<27:42:40, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 124/5080 [38:58<27:42:40, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 125/5080 [39:18<27:42:46, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 125/5080 [39:18<27:42:46, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.828, 'learning_rate': 3.75e-06, 'epoch': 0.49} + 2%|█▍ | 125/5080 [39:18<27:42:46, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 125/5080 [39:18<27:42:46, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 125/5080 [39:18<27:42:46, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 125/5080 [39:18<27:42:46, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 125/5080 [39:18<27:42:46, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 125/5080 [39:18<27:42:46, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 125/5080 [39:18<27:42:46, 20.13s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 126/5080 [39:37<27:23:56, 19.91s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 126/5080 [39:37<27:23:56, 19.91s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 126/5080 [39:37<27:23:56, 19.91s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 126/5080 [39:37<27:23:56, 19.91s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 126/5080 [39:37<27:23:56, 19.91s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 126/5080 [39:37<27:23:56, 19.91s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 126/5080 [39:37<27:23:56, 19.91s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 126/5080 [39:37<27:23:56, 19.91s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 127/5080 [39:56<27:10:09, 19.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 127/5080 [39:56<27:10:09, 19.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.826, 'learning_rate': 3.81e-06, 'epoch': 0.5} + 2%|█▍ | 127/5080 [39:56<27:10:09, 19.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 127/5080 [39:56<27:10:09, 19.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 127/5080 [39:56<27:10:09, 19.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 127/5080 [39:56<27:10:09, 19.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 127/5080 [39:56<27:10:09, 19.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 2%|█▍ | 127/5080 [39:56<27:10:09, 19.75s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 128/5080 [40:16<26:55:03, 19.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 128/5080 [40:16<26:55:03, 19.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.7354, 'learning_rate': 3.8400000000000005e-06, 'epoch': 0.5} + 3%|█▍ | 128/5080 [40:16<26:55:03, 19.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 128/5080 [40:16<26:55:03, 19.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 128/5080 [40:16<26:55:03, 19.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 128/5080 [40:16<26:55:03, 19.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 128/5080 [40:16<26:55:03, 19.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 128/5080 [40:16<26:55:03, 19.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.7595, 'learning_rate': 3.87e-06, 'epoch': 0.51} + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.7731, 'learning_rate': 3.9e-06, 'epoch': 0.51} + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 129/5080 [40:35<26:40:08, 19.39s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 131/5080 [41:12<26:09:37, 19.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 131/5080 [41:12<26:09:37, 19.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 131/5080 [41:12<26:09:37, 19.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 131/5080 [41:12<26:09:37, 19.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 131/5080 [41:12<26:09:37, 19.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 131/5080 [41:12<26:09:37, 19.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 131/5080 [41:12<26:09:37, 19.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 131/5080 [41:12<26:09:37, 19.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 131/5080 [41:12<26:09:37, 19.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.6715, 'learning_rate': 3.99e-06, 'epoch': 0.52} + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▍ | 132/5080 [41:30<25:54:07, 18.85s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 134/5080 [42:06<25:14:10, 18.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 134/5080 [42:06<25:14:10, 18.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.6981, 'learning_rate': 4.0200000000000005e-06, 'epoch': 0.53} + 3%|█▌ | 134/5080 [42:06<25:14:10, 18.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 134/5080 [42:06<25:14:10, 18.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 134/5080 [42:06<25:14:10, 18.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 134/5080 [42:06<25:14:10, 18.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 134/5080 [42:06<25:14:10, 18.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 134/5080 [42:06<25:14:10, 18.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 134/5080 [42:06<25:14:10, 18.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 135/5080 [42:24<24:53:42, 18.12s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 135/5080 [42:24<24:53:42, 18.12s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 135/5080 [42:24<24:53:42, 18.12s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 135/5080 [42:24<24:53:42, 18.12s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 135/5080 [42:24<24:53:42, 18.12s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 135/5080 [42:24<24:53:42, 18.12s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 135/5080 [42:24<24:53:42, 18.12s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 135/5080 [42:24<24:53:42, 18.12s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 136/5080 [42:41<24:27:23, 17.81s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 136/5080 [42:41<24:27:23, 17.81s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 136/5080 [42:41<24:27:23, 17.81s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 136/5080 [42:41<24:27:23, 17.81s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 136/5080 [42:41<24:27:23, 17.81s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 136/5080 [42:41<24:27:23, 17.81s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 136/5080 [42:41<24:27:23, 17.81s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 136/5080 [42:41<24:27:23, 17.81s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 136/5080 [42:41<24:27:23, 17.81s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 137/5080 [42:58<23:56:13, 17.43s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 137/5080 [42:58<23:56:13, 17.43s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 137/5080 [42:58<23:56:13, 17.43s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 137/5080 [42:58<23:56:13, 17.43s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 137/5080 [42:58<23:56:13, 17.43s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 137/5080 [42:58<23:56:13, 17.43s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 137/5080 [42:58<23:56:13, 17.43s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 137/5080 [42:58<23:56:13, 17.43s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 138/5080 [43:14<23:40:12, 17.24s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 138/5080 [43:14<23:40:12, 17.24s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.6494, 'learning_rate': 4.14e-06, 'epoch': 0.54} + 3%|█▌ | 138/5080 [43:14<23:40:12, 17.24s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 138/5080 [43:14<23:40:12, 17.24s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 138/5080 [43:14<23:40:12, 17.24s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 138/5080 [43:14<23:40:12, 17.24s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 138/5080 [43:14<23:40:12, 17.24s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 138/5080 [43:14<23:40:12, 17.24s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 138/5080 [43:14<23:40:12, 17.24s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 139/5080 [43:30<23:03:39, 16.80s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 139/5080 [43:30<23:03:39, 16.80s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 139/5080 [43:30<23:03:39, 16.80s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 139/5080 [43:30<23:03:39, 16.80s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 139/5080 [43:30<23:03:39, 16.80s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 139/5080 [43:30<23:03:39, 16.80s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 139/5080 [43:30<23:03:39, 16.80s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:25:56,854 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:15,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:15,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.7681, 'learning_rate': 4.2000000000000004e-06, 'epoch': 0.55} +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:15,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:15,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:15,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:15,726 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:28,058 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 141/5080 [43:59<21:21:46, 15.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 141/5080 [43:59<21:21:46, 15.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.7157, 'learning_rate': 4.229999999999999e-06, 'epoch': 0.55} + 3%|█▌ | 141/5080 [43:59<21:21:46, 15.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 141/5080 [43:59<21:21:46, 15.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:38,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:38,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:38,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:38,107 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 142/5080 [44:12<20:20:11, 14.83s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:27:44,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 142/5080 [44:12<20:20:11, 14.83s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:27:44,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 142/5080 [44:12<20:20:11, 14.83s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:27:44,497 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:27:50,614 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 143/5080 [44:24<19:16:46, 14.06s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:27:56,685 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 143/5080 [44:24<19:16:46, 14.06s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:27:56,685 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:00,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:00,958 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:05,081 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 144/5080 [44:36<18:08:53, 13.24s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▌ | 144/5080 [44:36<18:08:53, 13.24s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:09,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:09,082 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:12,879 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:15,321 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 145/5080 [44:46<16:50:31, 12.29s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 145/5080 [44:46<16:50:31, 12.29s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:18,937 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:21,157 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:23,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:23,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:23,371 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 146/5080 [44:55<15:30:24, 11.31s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:28:26,668 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:28,695 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:30,685 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:32,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:32,617 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 147/5080 [45:03<14:09:03, 10.33s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:28:34,584 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:36,356 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:38,067 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 148/5080 [45:10<12:45:17, 9.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 148/5080 [45:10<12:45:17, 9.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:42,957 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:44,450 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 149/5080 [45:16<11:25:16, 8.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 149/5080 [45:16<11:25:16, 8.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:48,712 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:28:51,142 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 150/5080 [45:22<10:19:29, 7.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 150/5080 [45:22<10:19:29, 7.54s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 150/5080 [45:22<10:19:29, 7.54s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:28:55,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 150/5080 [45:22<10:19:29, 7.54s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:28:55,674 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:29:01,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:29:01,528 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:29:07,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:29:07,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:29:07,456 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 151/5080 [45:45<17:01:53, 12.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 151/5080 [45:45<17:01:53, 12.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.7451, 'learning_rate': 4.53e-06, 'epoch': 0.59} + 3%|█▋ | 151/5080 [45:45<17:01:53, 12.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 151/5080 [45:45<17:01:53, 12.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 151/5080 [45:45<17:01:53, 12.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 151/5080 [45:45<17:01:53, 12.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 151/5080 [45:45<17:01:53, 12.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 151/5080 [45:45<17:01:53, 12.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 152/5080 [46:08<21:19:08, 15.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 152/5080 [46:08<21:19:08, 15.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.7872, 'learning_rate': 4.56e-06, 'epoch': 0.6} + 3%|█▋ | 152/5080 [46:08<21:19:08, 15.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 152/5080 [46:08<21:19:08, 15.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 152/5080 [46:08<21:19:08, 15.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 152/5080 [46:08<21:19:08, 15.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 152/5080 [46:08<21:19:08, 15.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 152/5080 [46:08<21:19:08, 15.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 152/5080 [46:08<21:19:08, 15.57s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 153/5080 [46:31<24:17:01, 17.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 153/5080 [46:31<24:17:01, 17.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 153/5080 [46:31<24:17:01, 17.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 153/5080 [46:31<24:17:01, 17.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 153/5080 [46:31<24:17:01, 17.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 153/5080 [46:31<24:17:01, 17.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 153/5080 [46:31<24:17:01, 17.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 153/5080 [46:31<24:17:01, 17.74s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 154/5080 [46:53<26:10:08, 19.12s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 154/5080 [46:53<26:10:08, 19.12s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.7505, 'learning_rate': 4.62e-06, 'epoch': 0.6} + 3%|█▋ | 154/5080 [46:53<26:10:08, 19.12s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 154/5080 [46:53<26:10:08, 19.12s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 154/5080 [46:53<26:10:08, 19.12s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 154/5080 [46:53<26:10:08, 19.12s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 154/5080 [46:53<26:10:08, 19.12s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 154/5080 [46:53<26:10:08, 19.12s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 155/5080 [47:16<27:26:30, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 155/5080 [47:16<27:26:30, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.7456, 'learning_rate': 4.65e-06, 'epoch': 0.61} + 3%|█▋ | 155/5080 [47:16<27:26:30, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 155/5080 [47:16<27:26:30, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 155/5080 [47:16<27:26:30, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 155/5080 [47:16<27:26:30, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 155/5080 [47:16<27:26:30, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▋ | 155/5080 [47:16<27:26:30, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 156/5080 [47:38<28:15:12, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 156/5080 [47:38<28:15:12, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.6346, 'learning_rate': 4.68e-06, 'epoch': 0.61} + 3%|█▊ | 156/5080 [47:38<28:15:12, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 156/5080 [47:38<28:15:12, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 156/5080 [47:38<28:15:12, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 156/5080 [47:38<28:15:12, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 156/5080 [47:38<28:15:12, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 156/5080 [47:38<28:15:12, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 156/5080 [47:38<28:15:12, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 157/5080 [48:00<28:45:39, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 157/5080 [48:00<28:45:39, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 157/5080 [48:00<28:45:39, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 157/5080 [48:00<28:45:39, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 157/5080 [48:00<28:45:39, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 157/5080 [48:00<28:45:39, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 157/5080 [48:00<28:45:39, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 157/5080 [48:00<28:45:39, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 157/5080 [48:00<28:45:39, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 158/5080 [48:22<29:08:11, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 158/5080 [48:22<29:08:11, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 158/5080 [48:22<29:08:11, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 158/5080 [48:22<29:08:11, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 158/5080 [48:22<29:08:11, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 158/5080 [48:22<29:08:11, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 158/5080 [48:22<29:08:11, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 158/5080 [48:22<29:08:11, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 158/5080 [48:22<29:08:11, 21.31s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 159/5080 [48:43<29:21:56, 21.48s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 159/5080 [48:43<29:21:56, 21.48s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 159/5080 [48:43<29:21:56, 21.48s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 159/5080 [48:43<29:21:56, 21.48s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 159/5080 [48:43<29:21:56, 21.48s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 159/5080 [48:43<29:21:56, 21.48s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 159/5080 [48:43<29:21:56, 21.48s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 159/5080 [48:43<29:21:56, 21.48s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 160/5080 [49:05<29:24:26, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 160/5080 [49:05<29:24:26, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.5441, 'learning_rate': 4.800000000000001e-06, 'epoch': 0.63} + 3%|█▊ | 160/5080 [49:05<29:24:26, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 160/5080 [49:05<29:24:26, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 160/5080 [49:05<29:24:26, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 160/5080 [49:05<29:24:26, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 160/5080 [49:05<29:24:26, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 160/5080 [49:05<29:24:26, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 160/5080 [49:05<29:24:26, 21.52s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 161/5080 [49:26<29:18:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 161/5080 [49:26<29:18:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 161/5080 [49:26<29:18:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 161/5080 [49:26<29:18:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 161/5080 [49:26<29:18:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 161/5080 [49:26<29:18:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 161/5080 [49:26<29:18:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 161/5080 [49:26<29:18:04, 21.44s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 162/5080 [49:48<29:11:34, 21.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 162/5080 [49:48<29:11:34, 21.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.5608, 'learning_rate': 4.86e-06, 'epoch': 0.64} + 3%|█▊ | 162/5080 [49:48<29:11:34, 21.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 162/5080 [49:48<29:11:34, 21.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 162/5080 [49:48<29:11:34, 21.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 162/5080 [49:48<29:11:34, 21.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 162/5080 [49:48<29:11:34, 21.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 162/5080 [49:48<29:11:34, 21.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 163/5080 [50:09<29:18:17, 21.46s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 163/5080 [50:09<29:18:17, 21.46s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.554, 'learning_rate': 4.890000000000001e-06, 'epoch': 0.64} + 3%|█▊ | 163/5080 [50:09<29:18:17, 21.46s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 163/5080 [50:09<29:18:17, 21.46s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 163/5080 [50:09<29:18:17, 21.46s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 163/5080 [50:09<29:18:17, 21.46s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 163/5080 [50:09<29:18:17, 21.46s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 163/5080 [50:09<29:18:17, 21.46s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 164/5080 [50:30<29:07:41, 21.33s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 164/5080 [50:30<29:07:41, 21.33s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.5778, 'learning_rate': 4.92e-06, 'epoch': 0.64} + 3%|█▊ | 164/5080 [50:30<29:07:41, 21.33s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 164/5080 [50:30<29:07:41, 21.33s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 164/5080 [50:30<29:07:41, 21.33s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 164/5080 [50:30<29:07:41, 21.33s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 164/5080 [50:30<29:07:41, 21.33s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 164/5080 [50:30<29:07:41, 21.33s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 165/5080 [50:51<28:55:53, 21.19s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 165/5080 [50:51<28:55:53, 21.19s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.5742, 'learning_rate': 4.95e-06, 'epoch': 0.65} + 3%|█▊ | 165/5080 [50:51<28:55:53, 21.19s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 165/5080 [50:51<28:55:53, 21.19s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 165/5080 [50:51<28:55:53, 21.19s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 165/5080 [50:51<28:55:53, 21.19s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 165/5080 [50:51<28:55:53, 21.19s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 165/5080 [50:51<28:55:53, 21.19s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 166/5080 [51:12<28:42:43, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 166/5080 [51:12<28:42:43, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.4956, 'learning_rate': 4.980000000000001e-06, 'epoch': 0.65} + 3%|█▊ | 166/5080 [51:12<28:42:43, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 166/5080 [51:12<28:42:43, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 166/5080 [51:12<28:42:43, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 166/5080 [51:12<28:42:43, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 166/5080 [51:12<28:42:43, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▊ | 166/5080 [51:12<28:42:43, 21.03s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.5563, 'learning_rate': 5.01e-06, 'epoch': 0.66} + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 168/5080 [51:53<28:23:02, 20.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 168/5080 [51:53<28:23:02, 20.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.5128, 'learning_rate': 5.04e-06, 'epoch': 0.66} + 3%|█▉ | 168/5080 [51:53<28:23:02, 20.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 168/5080 [51:53<28:23:02, 20.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 168/5080 [51:53<28:23:02, 20.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 168/5080 [51:53<28:23:02, 20.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 168/5080 [51:53<28:23:02, 20.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 168/5080 [51:53<28:23:02, 20.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 169/5080 [52:13<28:11:08, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 169/5080 [52:13<28:11:08, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.4998, 'learning_rate': 5.070000000000001e-06, 'epoch': 0.66} + 3%|█▉ | 169/5080 [52:13<28:11:08, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 169/5080 [52:13<28:11:08, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 169/5080 [52:13<28:11:08, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 169/5080 [52:13<28:11:08, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 169/5080 [52:13<28:11:08, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 169/5080 [52:13<28:11:08, 20.66s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 170/5080 [52:33<27:58:37, 20.51s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 170/5080 [52:33<27:58:37, 20.51s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.5013, 'learning_rate': 5.1e-06, 'epoch': 0.67} + 3%|█▉ | 170/5080 [52:33<27:58:37, 20.51s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 170/5080 [52:33<27:58:37, 20.51s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 170/5080 [52:33<27:58:37, 20.51s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 170/5080 [52:33<27:58:37, 20.51s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 170/5080 [52:33<27:58:37, 20.51s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 170/5080 [52:33<27:58:37, 20.51s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 171/5080 [52:54<27:46:59, 20.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 171/5080 [52:54<27:46:59, 20.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.5052, 'learning_rate': 5.130000000000001e-06, 'epoch': 0.67} + 3%|█▉ | 171/5080 [52:54<27:46:59, 20.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 171/5080 [52:54<27:46:59, 20.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 171/5080 [52:54<27:46:59, 20.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 171/5080 [52:54<27:46:59, 20.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 171/5080 [52:54<27:46:59, 20.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 171/5080 [52:54<27:46:59, 20.37s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 172/5080 [53:14<27:37:13, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 172/5080 [53:14<27:37:13, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.4767, 'learning_rate': 5.16e-06, 'epoch': 0.67} + 3%|█▉ | 172/5080 [53:14<27:37:13, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 172/5080 [53:14<27:37:13, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 172/5080 [53:14<27:37:13, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 172/5080 [53:14<27:37:13, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 172/5080 [53:14<27:37:13, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 172/5080 [53:14<27:37:13, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 172/5080 [53:14<27:37:13, 20.26s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 173/5080 [53:33<27:20:58, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 173/5080 [53:33<27:20:58, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 173/5080 [53:33<27:20:58, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 173/5080 [53:33<27:20:58, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 173/5080 [53:33<27:20:58, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 173/5080 [53:33<27:20:58, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 173/5080 [53:33<27:20:58, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 173/5080 [53:33<27:20:58, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 173/5080 [53:33<27:20:58, 20.06s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 174/5080 [53:53<27:09:46, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 174/5080 [53:53<27:09:46, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 174/5080 [53:53<27:09:46, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 174/5080 [53:53<27:09:46, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 174/5080 [53:53<27:09:46, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 174/5080 [53:53<27:09:46, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 174/5080 [53:53<27:09:46, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 174/5080 [53:53<27:09:46, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 174/5080 [53:53<27:09:46, 19.93s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 175/5080 [54:13<27:16:49, 20.02s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 175/5080 [54:13<27:16:49, 20.02s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 175/5080 [54:13<27:16:49, 20.02s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 175/5080 [54:13<27:16:49, 20.02s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 175/5080 [54:13<27:16:49, 20.02s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 175/5080 [54:13<27:16:49, 20.02s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 175/5080 [54:13<27:16:49, 20.02s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 175/5080 [54:13<27:16:49, 20.02s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 176/5080 [54:32<26:57:31, 19.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 176/5080 [54:32<26:57:31, 19.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.4797, 'learning_rate': 5.279999999999999e-06, 'epoch': 0.69} + 3%|█▉ | 176/5080 [54:32<26:57:31, 19.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 176/5080 [54:32<26:57:31, 19.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 176/5080 [54:32<26:57:31, 19.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 176/5080 [54:32<26:57:31, 19.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 176/5080 [54:32<26:57:31, 19.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 176/5080 [54:32<26:57:31, 19.79s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.3766, 'learning_rate': 5.31e-06, 'epoch': 0.69} + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.4145, 'learning_rate': 5.34e-06, 'epoch': 0.7} + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 3%|█▉ | 177/5080 [54:51<26:37:46, 19.55s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 179/5080 [55:29<26:03:24, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 179/5080 [55:29<26:03:24, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 179/5080 [55:29<26:03:24, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 179/5080 [55:29<26:03:24, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 179/5080 [55:29<26:03:24, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 179/5080 [55:29<26:03:24, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 179/5080 [55:29<26:03:24, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 179/5080 [55:29<26:03:24, 19.14s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 180/5080 [55:47<25:48:58, 18.97s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 180/5080 [55:47<25:48:58, 18.97s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.455, 'learning_rate': 5.4e-06, 'epoch': 0.71} + 4%|██ | 180/5080 [55:47<25:48:58, 18.97s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 180/5080 [55:47<25:48:58, 18.97s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 180/5080 [55:47<25:48:58, 18.97s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 180/5080 [55:47<25:48:58, 18.97s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 180/5080 [55:47<25:48:58, 18.97s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 180/5080 [55:47<25:48:58, 18.97s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 180/5080 [55:47<25:48:58, 18.97s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 181/5080 [56:06<25:35:45, 18.81s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 181/5080 [56:06<25:35:45, 18.81s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 181/5080 [56:06<25:35:45, 18.81s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 181/5080 [56:06<25:35:45, 18.81s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 181/5080 [56:06<25:35:45, 18.81s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 181/5080 [56:06<25:35:45, 18.81s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 181/5080 [56:06<25:35:45, 18.81s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 181/5080 [56:06<25:35:45, 18.81s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 182/5080 [56:24<25:21:04, 18.63s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 182/5080 [56:24<25:21:04, 18.63s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.3762, 'learning_rate': 5.46e-06, 'epoch': 0.71} + 4%|██ | 182/5080 [56:24<25:21:04, 18.63s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 182/5080 [56:24<25:21:04, 18.63s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 182/5080 [56:24<25:21:04, 18.63s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 182/5080 [56:24<25:21:04, 18.63s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 182/5080 [56:24<25:21:04, 18.63s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 182/5080 [56:24<25:21:04, 18.63s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 183/5080 [56:42<25:05:43, 18.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 183/5080 [56:42<25:05:43, 18.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.3935, 'learning_rate': 5.49e-06, 'epoch': 0.72} + 4%|██ | 183/5080 [56:42<25:05:43, 18.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 183/5080 [56:42<25:05:43, 18.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 183/5080 [56:42<25:05:43, 18.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 183/5080 [56:42<25:05:43, 18.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 183/5080 [56:42<25:05:43, 18.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 183/5080 [56:42<25:05:43, 18.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 184/5080 [57:00<24:48:20, 18.24s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 184/5080 [57:00<24:48:20, 18.24s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.3808, 'learning_rate': 5.52e-06, 'epoch': 0.72} + 4%|██ | 184/5080 [57:00<24:48:20, 18.24s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 184/5080 [57:00<24:48:20, 18.24s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 184/5080 [57:00<24:48:20, 18.24s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 184/5080 [57:00<24:48:20, 18.24s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 184/5080 [57:00<24:48:20, 18.24s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 184/5080 [57:00<24:48:20, 18.24s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 185/5080 [57:17<24:27:23, 17.99s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 185/5080 [57:17<24:27:23, 17.99s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.4104, 'learning_rate': 5.55e-06, 'epoch': 0.73} + 4%|██ | 185/5080 [57:17<24:27:23, 17.99s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 185/5080 [57:17<24:27:23, 17.99s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 185/5080 [57:17<24:27:23, 17.99s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 185/5080 [57:17<24:27:23, 17.99s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 185/5080 [57:17<24:27:23, 17.99s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 185/5080 [57:17<24:27:23, 17.99s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 185/5080 [57:17<24:27:23, 17.99s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 187/5080 [57:51<23:42:50, 17.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 187/5080 [57:51<23:42:50, 17.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.3122, 'learning_rate': 5.6100000000000005e-06, 'epoch': 0.73} + 4%|██ | 187/5080 [57:51<23:42:50, 17.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 187/5080 [57:51<23:42:50, 17.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 187/5080 [57:51<23:42:50, 17.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 187/5080 [57:51<23:42:50, 17.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 187/5080 [57:51<23:42:50, 17.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 187/5080 [57:51<23:42:50, 17.45s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 188/5080 [58:08<23:28:44, 17.28s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 188/5080 [58:08<23:28:44, 17.28s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +{'loss': 8.416, 'learning_rate': 5.64e-06, 'epoch': 0.74} + 4%|██ | 188/5080 [58:08<23:28:44, 17.28s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 188/5080 [58:08<23:28:44, 17.28s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 188/5080 [58:08<23:28:44, 17.28s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 188/5080 [58:08<23:28:44, 17.28s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 188/5080 [58:08<23:28:44, 17.28s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 188/5080 [58:08<23:28:44, 17.28s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 188/5080 [58:08<23:28:44, 17.28s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 189/5080 [58:24<22:55:31, 16.87s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 189/5080 [58:24<22:55:31, 16.87s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 189/5080 [58:24<22:55:31, 16.87s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 189/5080 [58:24<22:55:31, 16.87s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 189/5080 [58:24<22:55:31, 16.87s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 189/5080 [58:24<22:55:31, 16.87s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 189/5080 [58:24<22:55:31, 16.87s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██ | 189/5080 [58:24<22:55:31, 16.87s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 190/5080 [58:39<22:11:34, 16.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 190/5080 [58:39<22:11:34, 16.34s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:13,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:13,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:13,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:13,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:13,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:13,438 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 191/5080 [58:53<21:27:10, 15.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 191/5080 [58:53<21:27:10, 15.80s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:27,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:27,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:27,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:27,714 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:35,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:35,991 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 192/5080 [59:07<20:28:35, 15.08s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 192/5080 [59:07<20:28:35, 15.08s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 192/5080 [59:07<20:28:35, 15.08s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:43,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:43,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:43,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:43,963 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:49,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:49,972 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:54,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:54,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:42:54,433 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:00,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:00,083 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 194/5080 [59:31<18:13:32, 13.43s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:04,241 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:06,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:06,838 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:10,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:10,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 195/5080 [59:41<17:00:22, 12.53s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:14,367 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:16,710 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:18,985 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:18,985 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:18,985 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 196/5080 [59:51<15:42:27, 11.58s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:43:22,435 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:24,591 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:26,673 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:28,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:28,649 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 197/5080 [59:59<14:23:53, 10.62s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:43:30,667 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:32,562 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:34,400 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:36,130 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:36,130 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 198/5080 [1:00:06<13:04:24, 9.64s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:43:37,921 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:41,022 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:42,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:42,464 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 199/5080 [1:00:12<11:40:13, 8.61s/it][WARNING|modeling_utils.py:388] 2022-03-06 16:43:43,970 >> Could not estimate the number of tokens of the input, floating-point operations will not be computed +[WARNING|modeling_utils.py:388] 2022-03-06 16:43:46,520 >> Could not estimate the number of tokens of the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 200/5080 [1:00:18<10:27:32, 7.72s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + 4%|██▏ | 200/5080 [1:00:18<10:27:32, 7.72s/it]f the input, floating-point operations will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)raceback (most recent call last):ions will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)raceback (most recent call last):ions will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed + return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)raceback (most recent call last):ions will not be computedCould not estimate the number of tokens of the input, floating-point operations will not be computed \ No newline at end of file