train_mnli_1753094135

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mnli dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0828
  • Num Input Tokens Seen: 347859920

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1413 0.5 44179 0.1377 17403808
0.2235 1.0 88358 0.1067 34786008
0.0359 1.5 132537 0.0968 52165240
0.0608 2.0 176716 0.0922 69564424
0.023 2.5 220895 0.0894 86951080
0.0966 3.0 265074 0.0874 104352808
0.1134 3.5 309253 0.0866 121746504
0.1315 4.0 353432 0.0853 139123792
0.148 4.5 397611 0.0848 156526672
0.0366 5.0 441790 0.0841 173916408
0.0499 5.5 485969 0.0839 191309592
0.0782 6.0 530148 0.0836 208701328
0.0465 6.5 574327 0.0839 226098768
0.033 7.0 618506 0.0831 243493272
0.054 7.5 662685 0.0831 260881240
0.063 8.0 706864 0.0829 278276232
0.0511 8.5 751043 0.0829 295687496
0.0792 9.0 795222 0.0829 313062872
0.0915 9.5 839401 0.0829 330444056
0.0121 10.0 883580 0.0828 347859920

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.7.1+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_mnli_1753094135

Adapter
(971)
this model