train_sst2_1753094145

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0650
  • Num Input Tokens Seen: 33869824

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.0513 0.5 7577 0.0914 1694048
0.0216 1.0 15154 0.0729 3385616
0.21 1.5 22731 0.0728 5082864
0.1815 2.0 30308 0.0650 6774096
0.0059 2.5 37885 0.0753 8467152
0.0007 3.0 45462 0.0928 10161824
0.1285 3.5 53039 0.1153 11856000
0.0566 4.0 60616 0.1039 13549104
0.0001 4.5 68193 0.1309 15241168
0.0058 5.0 75770 0.1168 16935568
0.0002 5.5 83347 0.1569 18626160
0.0 6.0 90924 0.2074 20320896
0.0 6.5 98501 0.1737 22013696
0.0376 7.0 106078 0.2236 23709008
0.0674 7.5 113655 0.2430 25400400
0.0 8.0 121232 0.2517 27099520
0.0 8.5 128809 0.3193 28792480
0.0 9.0 136386 0.3085 30484864
0.0 9.5 143963 0.3159 32173664
0.0 10.0 151540 0.3201 33869824

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.7.1+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_sst2_1753094145

Adapter
(971)
this model