T5Laa2-Large-WeightedLoss
This model is a fine-tuned version of on the HuggingFaceFW/fineweb sample-350BT dataset. It achieves the following results on the evaluation set:
- Perplexity: 184.5759
- Loss: 5.2181
- Accuracy: 0.0373
- Lookahead Perplexity: 2089.7438
- Lookahead Loss: 7.6448
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 524288
Training results
Training Loss | Epoch | Step | Accuracy | Lookahead Loss | Lookahead Perplexity | Validation Loss | Perplexity |
---|---|---|---|---|---|---|---|
6.7799 | 0.0095 | 5000 | 0.0277 | 48.3492 | 994884838669032751104.0000 | 6.6288 | 756.5394 |
6.3701 | 0.0191 | 10000 | 0.0298 | 29.9422 | 10086010098930.062 | 6.2896 | 538.9212 |
6.1926 | 0.0286 | 15000 | 0.0310 | 17.4292 | 37103420.1102 | 6.0969 | 444.4629 |
6.058 | 0.0381 | 20000 | 0.0312 | 11.2096 | 73837.0007 | 5.9705 | 391.7203 |
5.9483 | 0.0477 | 25000 | 0.0318 | 8.7609 | 6379.6106 | 5.8987 | 364.5553 |
5.8936 | 0.0572 | 30000 | 0.0317 | 8.6431 | 5671.1388 | 5.9102 | 368.7617 |
5.9237 | 0.0668 | 35000 | 0.0342 | 8.5313 | 5070.9207 | 5.8056 | 332.1394 |
5.8761 | 0.0763 | 40000 | 0.0346 | 8.3857 | 4383.7940 | 5.7683 | 319.9989 |
5.8407 | 0.0858 | 45000 | 0.0353 | 8.6143 | 5509.9856 | 5.7586 | 316.8908 |
5.9167 | 0.0954 | 50000 | 0.0354 | 8.6580 | 5755.8949 | 5.7596 | 317.2192 |
6.0003 | 0.1049 | 55000 | 0.0359 | 8.7358 | 6221.9875 | 5.7997 | 330.1962 |
5.9179 | 0.1144 | 60000 | 0.0379 | 8.6974 | 5987.3140 | 5.7904 | 327.1379 |
5.9174 | 0.1240 | 65000 | 0.0384 | 8.5412 | 5121.6778 | 5.8089 | 333.2610 |
5.9954 | 0.1335 | 70000 | 0.0393 | 8.5149 | 4988.5368 | 5.8409 | 344.0949 |
6.0362 | 0.1431 | 75000 | 0.0400 | 8.3793 | 4356.0051 | 5.8705 | 354.4363 |
5.8366 | 0.1526 | 80000 | 0.0393 | 8.2272 | 3741.2679 | 5.8130 | 334.6056 |
6.0205 | 0.1621 | 85000 | 0.0392 | 8.5779 | 5313.1989 | 5.7982 | 329.6925 |
5.973 | 0.1717 | 90000 | 0.0398 | 8.8638 | 7071.0320 | 5.8446 | 345.3771 |
5.9258 | 0.1812 | 95000 | 0.0400 | 8.1622 | 3505.8012 | 5.7733 | 321.5895 |
5.8746 | 0.1907 | 100000 | 0.0403 | 8.2517 | 3834.1854 | 5.7572 | 316.4552 |
5.9069 | 0.2003 | 105000 | 0.0402 | 8.2418 | 3796.3680 | 5.7586 | 316.9171 |
5.9402 | 0.2098 | 110000 | 0.0400 | 8.5282 | 5055.2648 | 5.7703 | 320.6236 |
5.8692 | 0.2193 | 115000 | 0.0405 | 8.3863 | 4386.4832 | 5.7466 | 313.1122 |
5.9973 | 0.2289 | 120000 | 0.0394 | 8.6531 | 5727.7346 | 5.7815 | 324.2362 |
5.8888 | 0.2384 | 125000 | 0.0402 | 8.1073 | 3318.6616 | 5.7300 | 307.9743 |
5.9601 | 0.2480 | 130000 | 0.0409 | 8.6942 | 5968.4347 | 5.7525 | 314.9845 |
5.8925 | 0.2575 | 135000 | 0.0405 | 8.1664 | 3520.7167 | 5.7142 | 303.1319 |
5.8557 | 0.2670 | 140000 | 0.0401 | 7.9957 | 2968.1186 | 5.6992 | 298.6369 |
5.8511 | 0.2766 | 145000 | 0.0402 | 7.9752 | 2907.9587 | 5.7048 | 300.3090 |
5.8921 | 0.2861 | 150000 | 0.0407 | 7.9627 | 2871.8322 | 5.6648 | 288.5226 |
5.8002 | 0.2956 | 155000 | 0.0400 | 7.8912 | 2673.5755 | 5.6494 | 284.1184 |
5.8017 | 0.3052 | 160000 | 0.0400 | 8.0297 | 3070.7274 | 5.6654 | 288.7003 |
5.8462 | 0.3147 | 165000 | 0.0405 | 7.9691 | 2890.2133 | 5.6639 | 288.2660 |
5.8635 | 0.3242 | 170000 | 0.0403 | 8.2405 | 3791.4329 | 5.6655 | 288.7322 |
5.7894 | 0.3338 | 175000 | 0.0399 | 8.0391 | 3099.9643 | 5.6634 | 288.1246 |
5.9122 | 0.3433 | 180000 | 0.0411 | 8.0436 | 3113.8276 | 5.6549 | 285.6828 |
5.8401 | 0.3529 | 185000 | 0.0409 | 8.2639 | 3881.1753 | 5.6554 | 285.8374 |
5.8252 | 0.3624 | 190000 | 0.0408 | 7.9751 | 2907.7520 | 5.6592 | 286.9267 |
5.8975 | 0.3719 | 195000 | 0.0405 | 7.9789 | 2918.8320 | 5.6414 | 281.8590 |
5.8008 | 0.3815 | 200000 | 0.0393 | 7.8772 | 2636.5364 | 5.6323 | 279.2986 |
5.776 | 0.3910 | 205000 | 0.0401 | 7.9352 | 2793.9517 | 5.6288 | 278.3158 |
5.8825 | 0.4005 | 210000 | 0.0401 | 7.9805 | 2923.3879 | 5.6192 | 275.6601 |
5.7651 | 0.4101 | 215000 | 0.0400 | 7.9989 | 2977.7573 | 5.5993 | 270.2366 |
5.7721 | 0.4196 | 220000 | 0.0406 | 7.8928 | 2677.9319 | 5.5979 | 269.8660 |
5.8312 | 0.4292 | 225000 | 0.0396 | 8.0192 | 3038.8659 | 5.6054 | 271.8775 |
5.7752 | 0.4387 | 230000 | 0.0405 | 7.8009 | 2442.8390 | 5.5823 | 265.6886 |
5.8101 | 0.4482 | 235000 | 0.0397 | 7.8881 | 2665.3761 | 5.5903 | 267.8042 |
5.7115 | 0.4578 | 240000 | 0.0400 | 7.9381 | 2802.0555 | 5.5694 | 262.2645 |
5.7196 | 0.4673 | 245000 | 0.0394 | 7.8143 | 2475.7568 | 5.5596 | 259.7104 |
5.6944 | 0.4768 | 250000 | 0.0409 | 7.8772 | 2636.5595 | 5.5478 | 256.6677 |
5.6823 | 0.4864 | 255000 | 0.0395 | 7.7952 | 2428.9769 | 5.5298 | 252.0854 |
5.674 | 0.4959 | 260000 | 0.0399 | 7.8926 | 2677.4051 | 5.5318 | 252.5954 |
5.6606 | 0.5054 | 265000 | 0.0400 | 7.8189 | 2487.2421 | 5.5178 | 249.0964 |
5.7097 | 0.5150 | 270000 | 0.0395 | 7.8465 | 2556.6945 | 5.5101 | 247.1704 |
5.7047 | 0.5245 | 275000 | 0.0402 | 7.7667 | 2360.7532 | 5.5015 | 245.0604 |
5.6797 | 0.5341 | 280000 | 0.0397 | 7.8969 | 2688.8862 | 5.4982 | 244.2633 |
5.6739 | 0.5436 | 285000 | 0.0398 | 8.0241 | 3053.8151 | 5.4930 | 242.9751 |
5.6826 | 0.5531 | 290000 | 0.0397 | 7.9106 | 2726.0737 | 5.4990 | 244.4371 |
5.7864 | 0.5627 | 295000 | 0.0397 | 7.8498 | 2565.2361 | 5.4800 | 239.8371 |
5.6506 | 0.5722 | 300000 | 0.0401 | 7.9694 | 2891.0478 | 5.4805 | 239.9755 |
5.6403 | 0.5817 | 305000 | 0.0390 | 7.8301 | 2515.0960 | 5.4738 | 238.3728 |
5.6538 | 0.5913 | 310000 | 0.0398 | 7.8934 | 2679.4140 | 5.4811 | 240.0990 |
5.6665 | 0.6008 | 315000 | 0.0399 | 7.8407 | 2541.9513 | 5.4566 | 234.3107 |
5.5755 | 0.6104 | 320000 | 232.2292 | 5.4477 | 0.0395 | 2593.5901 | 7.8608 |
5.641 | 0.6199 | 325000 | 231.2019 | 5.4433 | 0.0394 | 2925.4813 | 7.9812 |
5.6113 | 0.6294 | 330000 | 229.7290 | 5.4369 | 0.0391 | 2471.8038 | 7.8127 |
5.6697 | 0.6390 | 335000 | 229.2882 | 5.4350 | 0.0394 | 2709.0417 | 7.9044 |
5.6425 | 0.6485 | 340000 | 228.1699 | 5.4301 | 0.0397 | 2550.9241 | 7.8442 |
5.626 | 0.6580 | 345000 | 226.4364 | 5.4225 | 0.0391 | 2601.7519 | 7.8639 |
5.5888 | 0.6676 | 350000 | 225.6680 | 5.4191 | 0.0394 | 2929.0955 | 7.9824 |
5.5793 | 0.6771 | 355000 | 224.6552 | 5.4146 | 0.0389 | 3111.5742 | 8.0429 |
5.5751 | 0.6866 | 360000 | 222.2638 | 5.4039 | 0.0385 | 2507.0953 | 7.8269 |
5.5659 | 0.6962 | 365000 | 219.8554 | 5.3930 | 0.0388 | 2442.3736 | 7.8007 |
5.6128 | 0.7057 | 370000 | 217.8869 | 5.3840 | 0.0385 | 2365.9076 | 7.7689 |
5.5471 | 0.7153 | 375000 | 216.1903 | 5.3762 | 0.0380 | 2286.1222 | 7.7346 |
5.5468 | 0.7248 | 380000 | 214.2540 | 5.3672 | 0.0387 | 2292.4794 | 7.7374 |
5.5354 | 0.7343 | 385000 | 211.9470 | 5.3563 | 0.0383 | 2245.4357 | 7.7167 |
5.5659 | 0.7439 | 390000 | 210.5267 | 5.3496 | 0.0384 | 2303.3757 | 7.7421 |
5.5114 | 0.7534 | 395000 | 209.1623 | 5.3431 | 0.0382 | 2344.8778 | 7.7600 |
5.5024 | 0.7629 | 400000 | 207.8823 | 5.3370 | 0.0383 | 2387.2802 | 7.7779 |
5.5723 | 0.7725 | 405000 | 206.5772 | 5.3307 | 0.0381 | 2312.2991 | 7.7460 |
5.4679 | 0.7820 | 410000 | 204.4713 | 5.3204 | 0.0384 | 2225.5773 | 7.7078 |
5.5022 | 0.7915 | 415000 | 202.6541 | 5.3115 | 0.0379 | 2216.8935 | 7.7039 |
5.4582 | 0.8011 | 420000 | 202.3064 | 5.3098 | 0.0382 | 2253.1685 | 7.7201 |
5.4716 | 0.8106 | 425000 | 199.9121 | 5.2979 | 0.0379 | 2201.0493 | 7.6967 |
5.4742 | 0.8202 | 430000 | 199.2166 | 5.2944 | 0.0379 | 2221.2136 | 7.7058 |
5.456 | 0.8297 | 435000 | 197.5286 | 5.2859 | 0.0378 | 2275.2305 | 7.7298 |
5.4751 | 0.8392 | 440000 | 196.2503 | 5.2794 | 0.0380 | 2230.8815 | 7.7102 |
5.4628 | 0.8488 | 445000 | 195.3121 | 5.2746 | 0.0379 | 2281.6319 | 7.7326 |
5.3535 | 0.8583 | 450000 | 195.0385 | 5.2732 | 0.0377 | 2178.3606 | 7.6863 |
5.5193 | 0.8678 | 455000 | 193.3610 | 5.2646 | 0.0380 | 2221.9979 | 7.7062 |
5.4747 | 0.8774 | 460000 | 192.5007 | 5.2601 | 0.0374 | 2183.6563 | 7.6888 |
5.4077 | 0.8869 | 465000 | 191.1322 | 5.2530 | 0.0375 | 2137.0722 | 7.6672 |
5.4288 | 0.8965 | 470000 | 190.2730 | 5.2485 | 0.0377 | 2108.7450 | 7.6538 |
5.4653 | 0.9060 | 475000 | 189.7327 | 5.2456 | 0.0377 | 2132.3223 | 7.6650 |
5.3929 | 0.9155 | 480000 | 188.8477 | 5.2409 | 0.0376 | 2122.3287 | 7.6603 |
5.405 | 0.9251 | 485000 | 187.7804 | 5.2353 | 0.0374 | 2107.0023 | 7.6530 |
5.4504 | 0.9346 | 490000 | 187.0694 | 5.2315 | 0.0374 | 2111.6347 | 7.6552 |
5.4217 | 0.9441 | 495000 | 186.9062 | 5.2306 | 0.0374 | 2110.7966 | 7.6548 |
5.4109 | 0.9537 | 500000 | 185.9346 | 5.2254 | 0.0372 | 2099.9909 | 7.6497 |
5.3892 | 1.0095 | 505000 | 185.5628 | 5.2234 | 0.0374 | 2097.9349 | 7.6487 |
5.3806 | 1.0191 | 510000 | 184.9853 | 5.2203 | 0.0374 | 2092.8939 | 7.6463 |
5.4174 | 1.0286 | 515000 | 184.8205 | 5.2194 | 0.0375 | 2090.1261 | 7.6450 |
5.4017 | 1.0381 | 520000 | 184.6505 | 5.2185 | 0.0374 | 2090.0473 | 7.6449 |
Framework versions
- Transformers 4.57.0.dev0
- Pytorch 2.8.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.1
- Downloads last month
- 311
Model tree for hrezaei/T5Laa2-Large-WeightedLoss
Dataset used to train hrezaei/T5Laa2-Large-WeightedLoss
Evaluation results
- Accuracy on HuggingFaceFW/fineweb sample-350BTself-reported0.037