Configuration Parsing Warning: In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string

flan-t5la-large

This model is a fine-tuned version of hrezaei/flan-t5la-large on the HuggingFaceFW/fineweb sample-350BT dataset. It achieves the following results on the evaluation set:

  • Perplexity: 5.0594
  • Loss: 1.6212
  • Accuracy: 0.0025
  • Lookahead Perplexity: 22.5300
  • Lookahead Loss: 3.1148
  • Base Perplexity: 1.1402
  • Base Loss: 0.1312

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • training_steps: 524288

Training results

Training Loss Epoch Step Accuracy Base Loss Base Perplexity Lookahead Loss Lookahead Perplexity Validation Loss Perplexity
3.6672 0.0095 5000 0.0025 0.1307 1.1396 7.6678 2138.3375 3.8992 49.3650
3.0591 0.0191 10000 0.0025 0.1307 1.1396 6.0778 436.0899 3.1043 22.2930
2.8435 0.0286 15000 0.0025 0.1307 1.1396 5.3831 217.6865 2.7569 15.7506
2.6871 0.0381 20000 0.0025 0.1307 1.1396 4.9640 143.1705 2.5474 12.7734
2.5599 0.0477 25000 0.0025 0.1307 1.1396 4.6813 107.9050 2.4060 11.0892
2.5297 0.0572 30000 0.0025 0.1307 1.1396 4.4748 87.7748 2.3027 10.0015
2.4168 0.0668 35000 0.0025 0.1307 1.1396 4.3162 74.9010 2.2234 9.2390
2.3901 0.0763 40000 0.0025 0.1307 1.1396 4.1905 66.0525 2.1606 8.6761
2.3584 0.0858 45000 0.0025 0.1307 1.1396 4.0883 59.6392 2.1095 8.2442
2.362 0.0954 50000 0.0025 0.1307 1.1396 4.0055 54.8976 2.0681 7.9097
2.3128 0.1049 55000 0.0025 0.1307 1.1396 3.9344 51.1303 2.0325 7.6334
2.3148 0.1144 60000 0.0025 0.1307 1.1396 3.8739 48.1289 2.0023 7.4060
2.2405 0.1240 65000 0.0025 0.1307 1.1396 3.8224 45.7117 1.9765 7.2176
2.2398 0.1335 70000 0.0025 0.1307 1.1396 3.7754 43.6163 1.9531 7.0503
2.2248 0.1431 75000 0.0025 0.1307 1.1396 3.7356 41.9128 1.9331 6.9112
2.234 0.1526 80000 0.0025 0.1307 1.1396 3.6986 40.3924 1.9147 6.7847
2.2168 0.1621 85000 0.0025 0.1307 1.1396 3.6669 39.1303 1.8988 6.6779
2.1853 0.1717 90000 0.0025 0.1307 1.1396 3.6360 37.9404 1.8834 6.5755
2.1745 0.1812 95000 0.0025 0.1307 1.1396 3.6090 36.9281 1.8698 6.4872
2.1623 0.1907 100000 0.0025 0.1307 1.1396 3.5836 36.0047 1.8572 6.4056
2.1587 0.2003 105000 0.0025 0.1307 1.1396 3.5609 35.1942 1.8458 6.3331
2.1783 0.2098 110000 0.0025 0.1307 1.1396 3.5395 34.4490 1.8351 6.2657
2.1364 0.2193 115000 0.0025 0.1307 1.1396 3.5210 33.8171 1.8258 6.2080
2.1116 0.2289 120000 0.0025 0.1307 1.1396 3.5038 33.2408 1.8172 6.1548
2.1257 0.2384 125000 0.0025 0.1307 1.1396 3.4854 32.6341 1.8080 6.0984
2.0968 1.0095 130000 0.0025 0.1307 1.1396 3.4699 32.1323 1.8003 6.0513
2.089 1.0191 135000 0.0025 0.1307 1.1396 3.4554 31.6718 1.7931 6.0078
2.1193 1.0286 140000 0.0025 0.1307 1.1396 3.4411 31.2219 1.7859 5.9650
2.1099 1.0381 145000 0.0025 0.1307 1.1396 3.4273 30.7945 1.7790 5.9240
2.0963 1.0477 150000 0.0025 0.1307 1.1396 3.4153 30.4261 1.7730 5.8885
2.1438 1.0572 155000 0.0025 0.1307 1.1396 3.4038 30.0780 1.7672 5.8547
2.0751 1.0668 160000 0.0025 0.1307 1.1396 3.3931 29.7586 1.7619 5.8235
2.0914 1.0763 165000 0.0025 0.1307 1.1396 3.3822 29.4352 1.7564 5.7918
2.1019 1.0858 170000 0.0025 0.1307 1.1396 3.3719 29.1332 1.7513 5.7620
2.1248 1.0954 175000 0.0025 0.1307 1.1396 3.3629 28.8725 1.7468 5.7362
2.0958 1.1049 180000 0.0025 0.1307 1.1396 3.3535 28.6030 1.7421 5.7094
2.1175 1.1144 185000 0.0025 0.1307 1.1396 3.3450 28.3614 1.7379 5.6852
2.0583 1.1240 190000 0.0025 0.1307 1.1396 3.3374 28.1471 1.7341 5.6637
2.074 1.1335 195000 0.0025 0.1307 1.1396 3.3290 27.9117 1.7299 5.6399
2.0537 1.0095 200000 0.0025 0.1312 1.1402 3.3221 27.7193 1.7248 5.6115
2.0681 1.0191 205000 0.0025 0.1312 1.1402 3.3116 27.4277 1.7195 5.5820
2.0942 1.0286 210000 0.0025 0.1312 1.1402 3.3024 27.1767 1.7150 5.5565
2.051 1.0381 215000 0.0025 0.1312 1.1402 3.2935 26.9378 1.7105 5.5319
2.0853 1.0477 220000 0.0025 0.1312 1.1402 3.2852 26.7130 1.7064 5.5088
2.0742 1.0572 225000 0.0025 0.1312 1.1402 3.2769 26.4942 1.7022 5.4862
2.0527 1.0668 230000 0.0025 0.1312 1.1402 3.2695 26.2972 1.6985 5.4657
2.0615 1.0763 235000 0.0025 0.1312 1.1402 3.2620 26.1027 1.6948 5.4457
2.038 1.0858 240000 0.0025 0.1312 1.1402 3.2550 25.9205 1.6913 5.4265
2.0266 1.0954 245000 0.0025 0.1312 1.1402 3.2480 25.7376 1.6877 5.4072
2.0502 1.1049 250000 0.0025 0.1312 1.1402 3.2412 25.5649 1.6844 5.3891
2.014 1.1144 255000 0.0025 0.1312 1.1402 3.2361 25.4340 1.6818 5.3755
2.0217 2.0048 260000 0.0025 0.1312 1.1402 3.2296 25.2699 1.6786 5.3580
2.0174 2.0143 265000 0.0025 0.1312 1.1402 3.2240 25.1281 1.6758 5.3430
2.0246 2.0238 270000 0.0025 0.1312 1.1402 3.2186 24.9925 1.6731 5.3285
2.0161 2.0334 275000 0.0025 0.1312 1.1402 3.2139 24.8749 1.6707 5.3159
2.0305 2.0429 280000 0.0025 0.1312 1.1402 3.2085 24.7417 1.6680 5.3017
2.0266 2.0525 285000 0.0025 0.1312 1.1402 3.2039 24.6288 1.6658 5.2897
2.0106 2.0620 290000 0.0025 0.1312 1.1402 3.1997 24.5256 1.6637 5.2786
2.006 1.0095 295000 0.0025 0.1312 1.1402 3.1953 24.4167 1.6614 5.2667
2.0219 1.0191 300000 0.0025 0.1312 1.1402 3.1907 24.3050 1.6591 5.2547
2.0522 1.0286 305000 0.0025 0.1312 1.1402 3.1868 24.2120 1.6572 5.2447
2.0096 1.0381 310000 0.0025 0.1312 1.1402 3.1832 24.1245 1.6554 5.2352
2.0466 1.0477 315000 0.0025 0.1312 1.1402 3.1796 24.0376 1.6536 5.2258
2.0341 1.0572 320000 0.0025 0.1312 1.1402 3.1759 23.9491 1.6518 5.2162
2.017 1.0668 325000 0.0025 0.1312 1.1402 3.1728 23.8741 1.6502 5.2080
2.0269 1.0763 330000 0.0025 0.1312 1.1402 3.1695 23.7946 1.6486 5.1995
2.003 1.0858 335000 0.0025 0.1312 1.1402 3.1663 23.7206 1.6470 5.1913
1.9965 1.0954 340000 0.0025 0.1312 1.1402 3.1631 23.6437 1.6453 5.1827
2.0197 1.1049 345000 0.0025 0.1312 1.1402 3.1599 23.5691 1.6438 5.1747
1.9874 1.1144 350000 0.0025 0.1312 1.1402 3.1578 23.5193 1.6427 5.1693
1.9942 2.0048 355000 0.0025 0.1312 1.1402 3.1547 23.4471 1.6412 5.1612
1.9911 2.0143 360000 0.0025 0.1312 1.1402 3.1522 23.3872 1.6399 5.1547
1.9987 2.0238 365000 0.0025 0.1312 1.1402 3.1497 23.3286 1.6386 5.1482
1.9909 2.0334 370000 0.0025 0.1312 1.1402 3.1476 23.2800 1.6376 5.1428
2.0061 2.0429 375000 0.0025 0.1312 1.1402 3.1451 23.2231 1.6364 5.1365
2.0043 2.0525 380000 0.0025 0.1312 1.1402 3.1431 23.1763 1.6354 5.1314
1.9892 2.0620 385000 0.0025 0.1312 1.1402 3.1412 23.1326 1.6344 5.1266
1.9978 2.0715 390000 0.0025 0.1312 1.1402 3.1393 23.0875 1.6335 5.1217
2.018 2.0811 395000 0.0025 0.1312 1.1402 3.1376 23.0495 1.6326 5.1174
1.9992 2.0906 400000 0.0025 0.1312 1.1402 3.1356 23.0027 1.6316 5.1121
2.0012 2.1001 405000 0.0025 0.1312 1.1402 3.1337 22.9586 1.6307 5.1073
2.0024 2.1097 410000 0.0025 0.1312 1.1402 3.1322 22.9236 1.6299 5.1034
1.9947 2.1192 415000 0.0025 0.1312 1.1402 3.1306 22.8870 1.6291 5.0993
1.9834 3.0095 420000 0.0025 0.1312 1.1402 3.1291 22.8540 1.6284 5.0956
2.0011 3.0191 425000 0.0025 0.1312 1.1402 3.1276 22.8196 1.6276 5.0917
2.0306 3.0286 430000 0.0025 0.1312 1.1402 3.1264 22.7918 1.6270 5.0887
1.9876 3.0381 435000 0.0025 0.1312 1.1402 3.1253 22.7664 1.6265 5.0858
2.0281 3.0477 440000 0.0025 0.1312 1.1402 3.1242 22.7409 1.6259 5.0830
2.0165 3.0572 445000 0.0025 0.1312 1.1402 3.1230 22.7149 1.6253 5.0801
1.9991 3.0668 450000 0.0025 0.1312 1.1402 3.1222 22.6954 1.6249 5.0779
2.01 3.0763 455000 0.0025 0.1312 1.1402 3.1212 22.6734 1.6244 5.0755
1.9873 3.0858 460000 0.0025 0.1312 1.1402 3.1204 22.6543 1.6240 5.0733
1.9795 3.0954 465000 0.0025 0.1312 1.1402 3.1194 22.6336 1.6235 5.0710
2.005 3.1049 470000 0.0025 0.1312 1.1402 3.1186 22.6137 1.6231 5.0688
1.9711 3.1144 475000 0.0025 0.1312 1.1402 3.1181 22.6044 1.6229 5.0678
1.9799 4.0048 480000 0.0025 0.1312 1.1402 3.1174 22.5871 1.6225 5.0658
1.9783 4.0143 485000 0.0025 0.1312 1.1402 3.1168 22.5742 1.6222 5.0644
1.9775 1.0095 490000 5.0632 1.6220 0.0025 22.5639 3.1164 1.1402 0.1312
1.994 1.0191 495000 5.0619 1.6217 0.0025 22.5525 3.1158 1.1402 0.1312
2.0268 1.0286 500000 5.0612 1.6216 0.0025 22.5462 3.1156 1.1402 0.1312
1.9849 1.0381 505000 5.0606 1.6215 0.0025 22.5414 3.1154 1.1402 0.1312
2.0234 1.0477 510000 5.0600 1.6214 0.0025 22.5360 3.1151 1.1402 0.1312
2.0123 1.0572 515000 5.0597 1.6213 0.0025 22.5324 3.1150 1.1402 0.1312
1.9957 1.0668 520000 5.0594 1.6212 0.0025 22.5301 3.1149 1.1402 0.1312

Framework versions

  • Transformers 4.57.0.dev0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
718
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hrezaei/flan-t5la-large

Unable to build the model tree, the base model loops to the model itself. Learn more.

Dataset used to train hrezaei/flan-t5la-large

Evaluation results