T5Laa2-Large-WeightedLoss

This model is a fine-tuned version of on the HuggingFaceFW/fineweb sample-350BT dataset. It achieves the following results on the evaluation set:

  • Perplexity: 184.5759
  • Loss: 5.2181
  • Accuracy: 0.0373
  • Lookahead Perplexity: 2089.7438
  • Lookahead Loss: 7.6448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • training_steps: 524288

Training results

Training Loss Epoch Step Accuracy Lookahead Loss Lookahead Perplexity Validation Loss Perplexity
6.7799 0.0095 5000 0.0277 48.3492 994884838669032751104.0000 6.6288 756.5394
6.3701 0.0191 10000 0.0298 29.9422 10086010098930.062 6.2896 538.9212
6.1926 0.0286 15000 0.0310 17.4292 37103420.1102 6.0969 444.4629
6.058 0.0381 20000 0.0312 11.2096 73837.0007 5.9705 391.7203
5.9483 0.0477 25000 0.0318 8.7609 6379.6106 5.8987 364.5553
5.8936 0.0572 30000 0.0317 8.6431 5671.1388 5.9102 368.7617
5.9237 0.0668 35000 0.0342 8.5313 5070.9207 5.8056 332.1394
5.8761 0.0763 40000 0.0346 8.3857 4383.7940 5.7683 319.9989
5.8407 0.0858 45000 0.0353 8.6143 5509.9856 5.7586 316.8908
5.9167 0.0954 50000 0.0354 8.6580 5755.8949 5.7596 317.2192
6.0003 0.1049 55000 0.0359 8.7358 6221.9875 5.7997 330.1962
5.9179 0.1144 60000 0.0379 8.6974 5987.3140 5.7904 327.1379
5.9174 0.1240 65000 0.0384 8.5412 5121.6778 5.8089 333.2610
5.9954 0.1335 70000 0.0393 8.5149 4988.5368 5.8409 344.0949
6.0362 0.1431 75000 0.0400 8.3793 4356.0051 5.8705 354.4363
5.8366 0.1526 80000 0.0393 8.2272 3741.2679 5.8130 334.6056
6.0205 0.1621 85000 0.0392 8.5779 5313.1989 5.7982 329.6925
5.973 0.1717 90000 0.0398 8.8638 7071.0320 5.8446 345.3771
5.9258 0.1812 95000 0.0400 8.1622 3505.8012 5.7733 321.5895
5.8746 0.1907 100000 0.0403 8.2517 3834.1854 5.7572 316.4552
5.9069 0.2003 105000 0.0402 8.2418 3796.3680 5.7586 316.9171
5.9402 0.2098 110000 0.0400 8.5282 5055.2648 5.7703 320.6236
5.8692 0.2193 115000 0.0405 8.3863 4386.4832 5.7466 313.1122
5.9973 0.2289 120000 0.0394 8.6531 5727.7346 5.7815 324.2362
5.8888 0.2384 125000 0.0402 8.1073 3318.6616 5.7300 307.9743
5.9601 0.2480 130000 0.0409 8.6942 5968.4347 5.7525 314.9845
5.8925 0.2575 135000 0.0405 8.1664 3520.7167 5.7142 303.1319
5.8557 0.2670 140000 0.0401 7.9957 2968.1186 5.6992 298.6369
5.8511 0.2766 145000 0.0402 7.9752 2907.9587 5.7048 300.3090
5.8921 0.2861 150000 0.0407 7.9627 2871.8322 5.6648 288.5226
5.8002 0.2956 155000 0.0400 7.8912 2673.5755 5.6494 284.1184
5.8017 0.3052 160000 0.0400 8.0297 3070.7274 5.6654 288.7003
5.8462 0.3147 165000 0.0405 7.9691 2890.2133 5.6639 288.2660
5.8635 0.3242 170000 0.0403 8.2405 3791.4329 5.6655 288.7322
5.7894 0.3338 175000 0.0399 8.0391 3099.9643 5.6634 288.1246
5.9122 0.3433 180000 0.0411 8.0436 3113.8276 5.6549 285.6828
5.8401 0.3529 185000 0.0409 8.2639 3881.1753 5.6554 285.8374
5.8252 0.3624 190000 0.0408 7.9751 2907.7520 5.6592 286.9267
5.8975 0.3719 195000 0.0405 7.9789 2918.8320 5.6414 281.8590
5.8008 0.3815 200000 0.0393 7.8772 2636.5364 5.6323 279.2986
5.776 0.3910 205000 0.0401 7.9352 2793.9517 5.6288 278.3158
5.8825 0.4005 210000 0.0401 7.9805 2923.3879 5.6192 275.6601
5.7651 0.4101 215000 0.0400 7.9989 2977.7573 5.5993 270.2366
5.7721 0.4196 220000 0.0406 7.8928 2677.9319 5.5979 269.8660
5.8312 0.4292 225000 0.0396 8.0192 3038.8659 5.6054 271.8775
5.7752 0.4387 230000 0.0405 7.8009 2442.8390 5.5823 265.6886
5.8101 0.4482 235000 0.0397 7.8881 2665.3761 5.5903 267.8042
5.7115 0.4578 240000 0.0400 7.9381 2802.0555 5.5694 262.2645
5.7196 0.4673 245000 0.0394 7.8143 2475.7568 5.5596 259.7104
5.6944 0.4768 250000 0.0409 7.8772 2636.5595 5.5478 256.6677
5.6823 0.4864 255000 0.0395 7.7952 2428.9769 5.5298 252.0854
5.674 0.4959 260000 0.0399 7.8926 2677.4051 5.5318 252.5954
5.6606 0.5054 265000 0.0400 7.8189 2487.2421 5.5178 249.0964
5.7097 0.5150 270000 0.0395 7.8465 2556.6945 5.5101 247.1704
5.7047 0.5245 275000 0.0402 7.7667 2360.7532 5.5015 245.0604
5.6797 0.5341 280000 0.0397 7.8969 2688.8862 5.4982 244.2633
5.6739 0.5436 285000 0.0398 8.0241 3053.8151 5.4930 242.9751
5.6826 0.5531 290000 0.0397 7.9106 2726.0737 5.4990 244.4371
5.7864 0.5627 295000 0.0397 7.8498 2565.2361 5.4800 239.8371
5.6506 0.5722 300000 0.0401 7.9694 2891.0478 5.4805 239.9755
5.6403 0.5817 305000 0.0390 7.8301 2515.0960 5.4738 238.3728
5.6538 0.5913 310000 0.0398 7.8934 2679.4140 5.4811 240.0990
5.6665 0.6008 315000 0.0399 7.8407 2541.9513 5.4566 234.3107
5.5755 0.6104 320000 232.2292 5.4477 0.0395 2593.5901 7.8608
5.641 0.6199 325000 231.2019 5.4433 0.0394 2925.4813 7.9812
5.6113 0.6294 330000 229.7290 5.4369 0.0391 2471.8038 7.8127
5.6697 0.6390 335000 229.2882 5.4350 0.0394 2709.0417 7.9044
5.6425 0.6485 340000 228.1699 5.4301 0.0397 2550.9241 7.8442
5.626 0.6580 345000 226.4364 5.4225 0.0391 2601.7519 7.8639
5.5888 0.6676 350000 225.6680 5.4191 0.0394 2929.0955 7.9824
5.5793 0.6771 355000 224.6552 5.4146 0.0389 3111.5742 8.0429
5.5751 0.6866 360000 222.2638 5.4039 0.0385 2507.0953 7.8269
5.5659 0.6962 365000 219.8554 5.3930 0.0388 2442.3736 7.8007
5.6128 0.7057 370000 217.8869 5.3840 0.0385 2365.9076 7.7689
5.5471 0.7153 375000 216.1903 5.3762 0.0380 2286.1222 7.7346
5.5468 0.7248 380000 214.2540 5.3672 0.0387 2292.4794 7.7374
5.5354 0.7343 385000 211.9470 5.3563 0.0383 2245.4357 7.7167
5.5659 0.7439 390000 210.5267 5.3496 0.0384 2303.3757 7.7421
5.5114 0.7534 395000 209.1623 5.3431 0.0382 2344.8778 7.7600
5.5024 0.7629 400000 207.8823 5.3370 0.0383 2387.2802 7.7779
5.5723 0.7725 405000 206.5772 5.3307 0.0381 2312.2991 7.7460
5.4679 0.7820 410000 204.4713 5.3204 0.0384 2225.5773 7.7078
5.5022 0.7915 415000 202.6541 5.3115 0.0379 2216.8935 7.7039
5.4582 0.8011 420000 202.3064 5.3098 0.0382 2253.1685 7.7201
5.4716 0.8106 425000 199.9121 5.2979 0.0379 2201.0493 7.6967
5.4742 0.8202 430000 199.2166 5.2944 0.0379 2221.2136 7.7058
5.456 0.8297 435000 197.5286 5.2859 0.0378 2275.2305 7.7298
5.4751 0.8392 440000 196.2503 5.2794 0.0380 2230.8815 7.7102
5.4628 0.8488 445000 195.3121 5.2746 0.0379 2281.6319 7.7326
5.3535 0.8583 450000 195.0385 5.2732 0.0377 2178.3606 7.6863
5.5193 0.8678 455000 193.3610 5.2646 0.0380 2221.9979 7.7062
5.4747 0.8774 460000 192.5007 5.2601 0.0374 2183.6563 7.6888
5.4077 0.8869 465000 191.1322 5.2530 0.0375 2137.0722 7.6672
5.4288 0.8965 470000 190.2730 5.2485 0.0377 2108.7450 7.6538
5.4653 0.9060 475000 189.7327 5.2456 0.0377 2132.3223 7.6650
5.3929 0.9155 480000 188.8477 5.2409 0.0376 2122.3287 7.6603
5.405 0.9251 485000 187.7804 5.2353 0.0374 2107.0023 7.6530
5.4504 0.9346 490000 187.0694 5.2315 0.0374 2111.6347 7.6552
5.4217 0.9441 495000 186.9062 5.2306 0.0374 2110.7966 7.6548
5.4109 0.9537 500000 185.9346 5.2254 0.0372 2099.9909 7.6497
5.3892 1.0095 505000 185.5628 5.2234 0.0374 2097.9349 7.6487
5.3806 1.0191 510000 184.9853 5.2203 0.0374 2092.8939 7.6463
5.4174 1.0286 515000 184.8205 5.2194 0.0375 2090.1261 7.6450
5.4017 1.0381 520000 184.6505 5.2185 0.0374 2090.0473 7.6449

Framework versions

  • Transformers 4.57.0.dev0
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
311
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hrezaei/T5Laa2-Large-WeightedLoss

Finetunes
1 model

Dataset used to train hrezaei/T5Laa2-Large-WeightedLoss

Evaluation results