Configuration Parsing
Warning:
In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string
flan-t5la-large
This model is a fine-tuned version of hrezaei/flan-t5la-large on the HuggingFaceFW/fineweb sample-350BT dataset. It achieves the following results on the evaluation set:
- Perplexity: 5.0594
- Loss: 1.6212
- Accuracy: 0.0025
- Lookahead Perplexity: 22.5300
- Lookahead Loss: 3.1148
- Base Perplexity: 1.1402
- Base Loss: 0.1312
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 524288
Training results
| Training Loss | Epoch | Step | Accuracy | Base Loss | Base Perplexity | Lookahead Loss | Lookahead Perplexity | Validation Loss | Perplexity |
|---|---|---|---|---|---|---|---|---|---|
| 3.6672 | 0.0095 | 5000 | 0.0025 | 0.1307 | 1.1396 | 7.6678 | 2138.3375 | 3.8992 | 49.3650 |
| 3.0591 | 0.0191 | 10000 | 0.0025 | 0.1307 | 1.1396 | 6.0778 | 436.0899 | 3.1043 | 22.2930 |
| 2.8435 | 0.0286 | 15000 | 0.0025 | 0.1307 | 1.1396 | 5.3831 | 217.6865 | 2.7569 | 15.7506 |
| 2.6871 | 0.0381 | 20000 | 0.0025 | 0.1307 | 1.1396 | 4.9640 | 143.1705 | 2.5474 | 12.7734 |
| 2.5599 | 0.0477 | 25000 | 0.0025 | 0.1307 | 1.1396 | 4.6813 | 107.9050 | 2.4060 | 11.0892 |
| 2.5297 | 0.0572 | 30000 | 0.0025 | 0.1307 | 1.1396 | 4.4748 | 87.7748 | 2.3027 | 10.0015 |
| 2.4168 | 0.0668 | 35000 | 0.0025 | 0.1307 | 1.1396 | 4.3162 | 74.9010 | 2.2234 | 9.2390 |
| 2.3901 | 0.0763 | 40000 | 0.0025 | 0.1307 | 1.1396 | 4.1905 | 66.0525 | 2.1606 | 8.6761 |
| 2.3584 | 0.0858 | 45000 | 0.0025 | 0.1307 | 1.1396 | 4.0883 | 59.6392 | 2.1095 | 8.2442 |
| 2.362 | 0.0954 | 50000 | 0.0025 | 0.1307 | 1.1396 | 4.0055 | 54.8976 | 2.0681 | 7.9097 |
| 2.3128 | 0.1049 | 55000 | 0.0025 | 0.1307 | 1.1396 | 3.9344 | 51.1303 | 2.0325 | 7.6334 |
| 2.3148 | 0.1144 | 60000 | 0.0025 | 0.1307 | 1.1396 | 3.8739 | 48.1289 | 2.0023 | 7.4060 |
| 2.2405 | 0.1240 | 65000 | 0.0025 | 0.1307 | 1.1396 | 3.8224 | 45.7117 | 1.9765 | 7.2176 |
| 2.2398 | 0.1335 | 70000 | 0.0025 | 0.1307 | 1.1396 | 3.7754 | 43.6163 | 1.9531 | 7.0503 |
| 2.2248 | 0.1431 | 75000 | 0.0025 | 0.1307 | 1.1396 | 3.7356 | 41.9128 | 1.9331 | 6.9112 |
| 2.234 | 0.1526 | 80000 | 0.0025 | 0.1307 | 1.1396 | 3.6986 | 40.3924 | 1.9147 | 6.7847 |
| 2.2168 | 0.1621 | 85000 | 0.0025 | 0.1307 | 1.1396 | 3.6669 | 39.1303 | 1.8988 | 6.6779 |
| 2.1853 | 0.1717 | 90000 | 0.0025 | 0.1307 | 1.1396 | 3.6360 | 37.9404 | 1.8834 | 6.5755 |
| 2.1745 | 0.1812 | 95000 | 0.0025 | 0.1307 | 1.1396 | 3.6090 | 36.9281 | 1.8698 | 6.4872 |
| 2.1623 | 0.1907 | 100000 | 0.0025 | 0.1307 | 1.1396 | 3.5836 | 36.0047 | 1.8572 | 6.4056 |
| 2.1587 | 0.2003 | 105000 | 0.0025 | 0.1307 | 1.1396 | 3.5609 | 35.1942 | 1.8458 | 6.3331 |
| 2.1783 | 0.2098 | 110000 | 0.0025 | 0.1307 | 1.1396 | 3.5395 | 34.4490 | 1.8351 | 6.2657 |
| 2.1364 | 0.2193 | 115000 | 0.0025 | 0.1307 | 1.1396 | 3.5210 | 33.8171 | 1.8258 | 6.2080 |
| 2.1116 | 0.2289 | 120000 | 0.0025 | 0.1307 | 1.1396 | 3.5038 | 33.2408 | 1.8172 | 6.1548 |
| 2.1257 | 0.2384 | 125000 | 0.0025 | 0.1307 | 1.1396 | 3.4854 | 32.6341 | 1.8080 | 6.0984 |
| 2.0968 | 1.0095 | 130000 | 0.0025 | 0.1307 | 1.1396 | 3.4699 | 32.1323 | 1.8003 | 6.0513 |
| 2.089 | 1.0191 | 135000 | 0.0025 | 0.1307 | 1.1396 | 3.4554 | 31.6718 | 1.7931 | 6.0078 |
| 2.1193 | 1.0286 | 140000 | 0.0025 | 0.1307 | 1.1396 | 3.4411 | 31.2219 | 1.7859 | 5.9650 |
| 2.1099 | 1.0381 | 145000 | 0.0025 | 0.1307 | 1.1396 | 3.4273 | 30.7945 | 1.7790 | 5.9240 |
| 2.0963 | 1.0477 | 150000 | 0.0025 | 0.1307 | 1.1396 | 3.4153 | 30.4261 | 1.7730 | 5.8885 |
| 2.1438 | 1.0572 | 155000 | 0.0025 | 0.1307 | 1.1396 | 3.4038 | 30.0780 | 1.7672 | 5.8547 |
| 2.0751 | 1.0668 | 160000 | 0.0025 | 0.1307 | 1.1396 | 3.3931 | 29.7586 | 1.7619 | 5.8235 |
| 2.0914 | 1.0763 | 165000 | 0.0025 | 0.1307 | 1.1396 | 3.3822 | 29.4352 | 1.7564 | 5.7918 |
| 2.1019 | 1.0858 | 170000 | 0.0025 | 0.1307 | 1.1396 | 3.3719 | 29.1332 | 1.7513 | 5.7620 |
| 2.1248 | 1.0954 | 175000 | 0.0025 | 0.1307 | 1.1396 | 3.3629 | 28.8725 | 1.7468 | 5.7362 |
| 2.0958 | 1.1049 | 180000 | 0.0025 | 0.1307 | 1.1396 | 3.3535 | 28.6030 | 1.7421 | 5.7094 |
| 2.1175 | 1.1144 | 185000 | 0.0025 | 0.1307 | 1.1396 | 3.3450 | 28.3614 | 1.7379 | 5.6852 |
| 2.0583 | 1.1240 | 190000 | 0.0025 | 0.1307 | 1.1396 | 3.3374 | 28.1471 | 1.7341 | 5.6637 |
| 2.074 | 1.1335 | 195000 | 0.0025 | 0.1307 | 1.1396 | 3.3290 | 27.9117 | 1.7299 | 5.6399 |
| 2.0537 | 1.0095 | 200000 | 0.0025 | 0.1312 | 1.1402 | 3.3221 | 27.7193 | 1.7248 | 5.6115 |
| 2.0681 | 1.0191 | 205000 | 0.0025 | 0.1312 | 1.1402 | 3.3116 | 27.4277 | 1.7195 | 5.5820 |
| 2.0942 | 1.0286 | 210000 | 0.0025 | 0.1312 | 1.1402 | 3.3024 | 27.1767 | 1.7150 | 5.5565 |
| 2.051 | 1.0381 | 215000 | 0.0025 | 0.1312 | 1.1402 | 3.2935 | 26.9378 | 1.7105 | 5.5319 |
| 2.0853 | 1.0477 | 220000 | 0.0025 | 0.1312 | 1.1402 | 3.2852 | 26.7130 | 1.7064 | 5.5088 |
| 2.0742 | 1.0572 | 225000 | 0.0025 | 0.1312 | 1.1402 | 3.2769 | 26.4942 | 1.7022 | 5.4862 |
| 2.0527 | 1.0668 | 230000 | 0.0025 | 0.1312 | 1.1402 | 3.2695 | 26.2972 | 1.6985 | 5.4657 |
| 2.0615 | 1.0763 | 235000 | 0.0025 | 0.1312 | 1.1402 | 3.2620 | 26.1027 | 1.6948 | 5.4457 |
| 2.038 | 1.0858 | 240000 | 0.0025 | 0.1312 | 1.1402 | 3.2550 | 25.9205 | 1.6913 | 5.4265 |
| 2.0266 | 1.0954 | 245000 | 0.0025 | 0.1312 | 1.1402 | 3.2480 | 25.7376 | 1.6877 | 5.4072 |
| 2.0502 | 1.1049 | 250000 | 0.0025 | 0.1312 | 1.1402 | 3.2412 | 25.5649 | 1.6844 | 5.3891 |
| 2.014 | 1.1144 | 255000 | 0.0025 | 0.1312 | 1.1402 | 3.2361 | 25.4340 | 1.6818 | 5.3755 |
| 2.0217 | 2.0048 | 260000 | 0.0025 | 0.1312 | 1.1402 | 3.2296 | 25.2699 | 1.6786 | 5.3580 |
| 2.0174 | 2.0143 | 265000 | 0.0025 | 0.1312 | 1.1402 | 3.2240 | 25.1281 | 1.6758 | 5.3430 |
| 2.0246 | 2.0238 | 270000 | 0.0025 | 0.1312 | 1.1402 | 3.2186 | 24.9925 | 1.6731 | 5.3285 |
| 2.0161 | 2.0334 | 275000 | 0.0025 | 0.1312 | 1.1402 | 3.2139 | 24.8749 | 1.6707 | 5.3159 |
| 2.0305 | 2.0429 | 280000 | 0.0025 | 0.1312 | 1.1402 | 3.2085 | 24.7417 | 1.6680 | 5.3017 |
| 2.0266 | 2.0525 | 285000 | 0.0025 | 0.1312 | 1.1402 | 3.2039 | 24.6288 | 1.6658 | 5.2897 |
| 2.0106 | 2.0620 | 290000 | 0.0025 | 0.1312 | 1.1402 | 3.1997 | 24.5256 | 1.6637 | 5.2786 |
| 2.006 | 1.0095 | 295000 | 0.0025 | 0.1312 | 1.1402 | 3.1953 | 24.4167 | 1.6614 | 5.2667 |
| 2.0219 | 1.0191 | 300000 | 0.0025 | 0.1312 | 1.1402 | 3.1907 | 24.3050 | 1.6591 | 5.2547 |
| 2.0522 | 1.0286 | 305000 | 0.0025 | 0.1312 | 1.1402 | 3.1868 | 24.2120 | 1.6572 | 5.2447 |
| 2.0096 | 1.0381 | 310000 | 0.0025 | 0.1312 | 1.1402 | 3.1832 | 24.1245 | 1.6554 | 5.2352 |
| 2.0466 | 1.0477 | 315000 | 0.0025 | 0.1312 | 1.1402 | 3.1796 | 24.0376 | 1.6536 | 5.2258 |
| 2.0341 | 1.0572 | 320000 | 0.0025 | 0.1312 | 1.1402 | 3.1759 | 23.9491 | 1.6518 | 5.2162 |
| 2.017 | 1.0668 | 325000 | 0.0025 | 0.1312 | 1.1402 | 3.1728 | 23.8741 | 1.6502 | 5.2080 |
| 2.0269 | 1.0763 | 330000 | 0.0025 | 0.1312 | 1.1402 | 3.1695 | 23.7946 | 1.6486 | 5.1995 |
| 2.003 | 1.0858 | 335000 | 0.0025 | 0.1312 | 1.1402 | 3.1663 | 23.7206 | 1.6470 | 5.1913 |
| 1.9965 | 1.0954 | 340000 | 0.0025 | 0.1312 | 1.1402 | 3.1631 | 23.6437 | 1.6453 | 5.1827 |
| 2.0197 | 1.1049 | 345000 | 0.0025 | 0.1312 | 1.1402 | 3.1599 | 23.5691 | 1.6438 | 5.1747 |
| 1.9874 | 1.1144 | 350000 | 0.0025 | 0.1312 | 1.1402 | 3.1578 | 23.5193 | 1.6427 | 5.1693 |
| 1.9942 | 2.0048 | 355000 | 0.0025 | 0.1312 | 1.1402 | 3.1547 | 23.4471 | 1.6412 | 5.1612 |
| 1.9911 | 2.0143 | 360000 | 0.0025 | 0.1312 | 1.1402 | 3.1522 | 23.3872 | 1.6399 | 5.1547 |
| 1.9987 | 2.0238 | 365000 | 0.0025 | 0.1312 | 1.1402 | 3.1497 | 23.3286 | 1.6386 | 5.1482 |
| 1.9909 | 2.0334 | 370000 | 0.0025 | 0.1312 | 1.1402 | 3.1476 | 23.2800 | 1.6376 | 5.1428 |
| 2.0061 | 2.0429 | 375000 | 0.0025 | 0.1312 | 1.1402 | 3.1451 | 23.2231 | 1.6364 | 5.1365 |
| 2.0043 | 2.0525 | 380000 | 0.0025 | 0.1312 | 1.1402 | 3.1431 | 23.1763 | 1.6354 | 5.1314 |
| 1.9892 | 2.0620 | 385000 | 0.0025 | 0.1312 | 1.1402 | 3.1412 | 23.1326 | 1.6344 | 5.1266 |
| 1.9978 | 2.0715 | 390000 | 0.0025 | 0.1312 | 1.1402 | 3.1393 | 23.0875 | 1.6335 | 5.1217 |
| 2.018 | 2.0811 | 395000 | 0.0025 | 0.1312 | 1.1402 | 3.1376 | 23.0495 | 1.6326 | 5.1174 |
| 1.9992 | 2.0906 | 400000 | 0.0025 | 0.1312 | 1.1402 | 3.1356 | 23.0027 | 1.6316 | 5.1121 |
| 2.0012 | 2.1001 | 405000 | 0.0025 | 0.1312 | 1.1402 | 3.1337 | 22.9586 | 1.6307 | 5.1073 |
| 2.0024 | 2.1097 | 410000 | 0.0025 | 0.1312 | 1.1402 | 3.1322 | 22.9236 | 1.6299 | 5.1034 |
| 1.9947 | 2.1192 | 415000 | 0.0025 | 0.1312 | 1.1402 | 3.1306 | 22.8870 | 1.6291 | 5.0993 |
| 1.9834 | 3.0095 | 420000 | 0.0025 | 0.1312 | 1.1402 | 3.1291 | 22.8540 | 1.6284 | 5.0956 |
| 2.0011 | 3.0191 | 425000 | 0.0025 | 0.1312 | 1.1402 | 3.1276 | 22.8196 | 1.6276 | 5.0917 |
| 2.0306 | 3.0286 | 430000 | 0.0025 | 0.1312 | 1.1402 | 3.1264 | 22.7918 | 1.6270 | 5.0887 |
| 1.9876 | 3.0381 | 435000 | 0.0025 | 0.1312 | 1.1402 | 3.1253 | 22.7664 | 1.6265 | 5.0858 |
| 2.0281 | 3.0477 | 440000 | 0.0025 | 0.1312 | 1.1402 | 3.1242 | 22.7409 | 1.6259 | 5.0830 |
| 2.0165 | 3.0572 | 445000 | 0.0025 | 0.1312 | 1.1402 | 3.1230 | 22.7149 | 1.6253 | 5.0801 |
| 1.9991 | 3.0668 | 450000 | 0.0025 | 0.1312 | 1.1402 | 3.1222 | 22.6954 | 1.6249 | 5.0779 |
| 2.01 | 3.0763 | 455000 | 0.0025 | 0.1312 | 1.1402 | 3.1212 | 22.6734 | 1.6244 | 5.0755 |
| 1.9873 | 3.0858 | 460000 | 0.0025 | 0.1312 | 1.1402 | 3.1204 | 22.6543 | 1.6240 | 5.0733 |
| 1.9795 | 3.0954 | 465000 | 0.0025 | 0.1312 | 1.1402 | 3.1194 | 22.6336 | 1.6235 | 5.0710 |
| 2.005 | 3.1049 | 470000 | 0.0025 | 0.1312 | 1.1402 | 3.1186 | 22.6137 | 1.6231 | 5.0688 |
| 1.9711 | 3.1144 | 475000 | 0.0025 | 0.1312 | 1.1402 | 3.1181 | 22.6044 | 1.6229 | 5.0678 |
| 1.9799 | 4.0048 | 480000 | 0.0025 | 0.1312 | 1.1402 | 3.1174 | 22.5871 | 1.6225 | 5.0658 |
| 1.9783 | 4.0143 | 485000 | 0.0025 | 0.1312 | 1.1402 | 3.1168 | 22.5742 | 1.6222 | 5.0644 |
| 1.9775 | 1.0095 | 490000 | 5.0632 | 1.6220 | 0.0025 | 22.5639 | 3.1164 | 1.1402 | 0.1312 |
| 1.994 | 1.0191 | 495000 | 5.0619 | 1.6217 | 0.0025 | 22.5525 | 3.1158 | 1.1402 | 0.1312 |
| 2.0268 | 1.0286 | 500000 | 5.0612 | 1.6216 | 0.0025 | 22.5462 | 3.1156 | 1.1402 | 0.1312 |
| 1.9849 | 1.0381 | 505000 | 5.0606 | 1.6215 | 0.0025 | 22.5414 | 3.1154 | 1.1402 | 0.1312 |
| 2.0234 | 1.0477 | 510000 | 5.0600 | 1.6214 | 0.0025 | 22.5360 | 3.1151 | 1.1402 | 0.1312 |
| 2.0123 | 1.0572 | 515000 | 5.0597 | 1.6213 | 0.0025 | 22.5324 | 3.1150 | 1.1402 | 0.1312 |
| 1.9957 | 1.0668 | 520000 | 5.0594 | 1.6212 | 0.0025 | 22.5301 | 3.1149 | 1.1402 | 0.1312 |
Framework versions
- Transformers 4.57.0.dev0
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.1
- Downloads last month
- 718
Model tree for hrezaei/flan-t5la-large
Unable to build the model tree, the base model loops to the model itself. Learn more.
Dataset used to train hrezaei/flan-t5la-large
Evaluation results
- Accuracy on HuggingFaceFW/fineweb sample-350BTself-reported0.003