train_sst2_1753094144

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0639
  • Num Input Tokens Seen: 33869824

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.0693 0.5 7577 0.1049 1694048
0.0521 1.0 15154 0.0835 3385616
0.1755 1.5 22731 0.0754 5082864
0.1278 2.0 30308 0.0707 6774096
0.015 2.5 37885 0.0695 8467152
0.0088 3.0 45462 0.0670 10161824
0.0201 3.5 53039 0.0681 11856000
0.1397 4.0 60616 0.0659 13549104
0.044 4.5 68193 0.0653 15241168
0.0067 5.0 75770 0.0651 16935568
0.1097 5.5 83347 0.0642 18626160
0.053 6.0 90924 0.0644 20320896
0.0618 6.5 98501 0.0647 22013696
0.0859 7.0 106078 0.0644 23709008
0.1184 7.5 113655 0.0653 25400400
0.0073 8.0 121232 0.0646 27099520
0.0052 8.5 128809 0.0639 28792480
0.0593 9.0 136386 0.0643 30484864
0.0249 9.5 143963 0.0641 32173664
0.023 10.0 151540 0.0642 33869824

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.7.1+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_sst2_1753094144

Adapter
(971)
this model