T5Lae-Large-newloss

This model is a fine-tuned version of on the HuggingFaceFW/fineweb sample-350BT dataset. It achieves the following results on the evaluation set:

  • Loss: 6.8585
  • Accuracy: 0.0314

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • training_steps: 524288
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
7.1412 0.0095 5000 7.1369 0.0289
6.9628 0.0191 10000 6.9957 0.0287
6.9811 0.0286 15000 6.9385 0.0283
6.9319 0.0381 20000 6.9098 0.0289
6.8933 0.0477 25000 6.8881 0.0279
6.8351 0.0572 30000 6.8689 0.0281
6.9137 0.0668 35000 6.8726 0.0267
6.8591 0.0763 40000 6.8488 0.0294
6.87 0.0858 45000 6.8456 0.0285
6.923 0.0954 50000 6.8366 0.0300
6.8938 0.1049 55000 6.8524 0.0279
6.8661 0.1144 60000 6.8477 0.0291
6.818 0.1240 65000 6.8556 0.0295
6.8447 0.1335 70000 6.8708 0.0295
6.9407 0.1431 75000 6.9431 0.0285
6.8112 0.1526 80000 6.8968 0.0289
6.9326 0.1621 85000 6.8867 0.0303
6.93 0.1717 90000 6.9071 0.0289
6.8775 0.1812 95000 6.9182 0.0282
6.9303 0.1907 100000 6.9365 0.0298
6.9712 0.2003 105000 6.9177 0.0304
6.9708 0.2098 110000 6.9189 0.0295
6.9146 0.2193 115000 6.9231 0.0321
6.9805 0.2289 120000 6.9242 0.0300
6.9544 0.2384 125000 6.9080 0.0310
6.9911 0.2480 130000 6.9370 0.0300
6.9553 0.2575 135000 6.9171 0.0321
6.9499 0.2670 140000 6.9315 0.0313
6.9597 0.2766 145000 6.9353 0.0280
6.9236 0.2861 150000 6.9240 0.0313
6.9353 0.2956 155000 6.9222 0.0288
6.929 0.3052 160000 6.9323 0.0288
6.9524 0.3147 165000 6.9184 0.0301
7.0115 0.3242 170000 6.9191 0.0306
6.8984 0.3338 175000 6.9079 0.0307
7.0273 0.3433 180000 6.9077 0.0298
6.9773 0.3529 185000 6.9058 0.0309
6.9534 0.3624 190000 6.8999 0.0296
6.9413 0.3719 195000 6.9024 0.0297
6.9484 0.3815 200000 6.9112 0.0302
6.8938 0.3910 205000 6.9132 0.0311
6.9719 0.4005 210000 6.9025 0.0311
6.9633 0.4101 215000 6.9029 0.0307
6.9052 0.4196 220000 6.9112 0.0305
7.0124 0.4292 225000 6.9072 0.0312
6.9954 0.4387 230000 6.9048 0.0304
7.0005 0.4482 235000 6.9017 0.0289
6.9355 0.4578 240000 6.9006 0.0316
6.9119 0.4673 245000 6.8986 0.0309
6.9151 0.4768 250000 6.9139 0.0312
6.9032 0.4864 255000 6.9015 0.0310
6.9393 0.4959 260000 6.9001 0.0301
6.8839 0.5054 265000 6.9016 0.0301
6.927 0.5150 270000 6.9122 0.0302
6.979 0.5245 275000 6.9016 0.0299
6.9083 0.5341 280000 6.8971 0.0301
6.883 0.5436 285000 6.9037 0.0297
6.9126 0.5531 290000 6.8944 0.0309
6.9554 0.5627 295000 6.9077 0.0305
6.9157 0.5722 300000 6.8818 0.0315
6.9177 0.5817 305000 6.8835 0.0311
6.9511 0.5913 310000 6.8923 0.0318
6.9543 0.6008 315000 6.8898 0.0311
6.8546 0.6104 320000 6.8879 0.0302
6.8927 0.6199 325000 6.8771 0.0314
6.8991 0.6294 330000 6.8830 0.0303
6.9353 0.6390 335000 6.8958 0.0311
6.9027 0.6485 340000 6.8875 0.0309
6.9281 0.6580 345000 6.8875 0.0308
6.8576 0.6676 350000 6.9075 0.0304
6.8658 0.6771 355000 6.8905 0.0311
6.8994 0.6866 360000 6.8820 0.0305
6.8742 0.6962 365000 6.8769 0.0310
6.9569 0.7057 370000 6.8904 0.0304
6.8804 0.7153 375000 6.8841 0.0306
6.8935 0.7248 380000 6.8868 0.0301
6.878 0.7343 385000 6.8768 0.0307
6.9091 0.7439 390000 6.8738 0.0316
6.8698 0.7534 395000 6.8725 0.0307
6.8922 0.7629 400000 6.8776 0.0309
6.942 0.7725 405000 6.8744 0.0302
6.8491 0.7820 410000 6.8605 0.0312
6.9081 0.7915 415000 6.8673 0.0307
6.8476 0.8011 420000 6.8780 0.0312
6.881 0.8106 425000 6.8715 0.0313
6.8464 0.8202 430000 6.8736 0.0315
6.8448 0.8297 435000 6.8722 0.0309
6.8912 0.8392 440000 6.8771 0.0310
6.8407 0.8488 445000 6.8742 0.0304
6.7812 0.8583 450000 6.8768 0.0310
6.8851 0.8678 455000 6.8685 0.0311
6.8657 0.8774 460000 6.8623 0.0311
6.8474 0.8869 465000 6.8626 0.0309
6.8822 0.8965 470000 6.8656 0.0312
6.9379 0.9060 475000 6.8671 0.0312
6.8214 0.9155 480000 6.8672 0.0306
6.8276 0.9251 485000 6.8676 0.0310
6.8784 0.9346 490000 6.8655 0.0314
6.8923 0.9441 495000 6.8613 0.0313
6.8859 0.9537 500000 6.8621 0.0306
6.8558 1.0095 505000 6.8629 0.0308
6.7993 1.0191 510000 6.8625 0.0310
6.8821 1.0286 515000 6.8592 0.0313
6.8774 1.0381 520000 6.8593 0.0314

Framework versions

  • Transformers 4.56.0.dev0
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
3
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train hrezaei/T5Lae-Large-newloss

Evaluation results