train_wsc_42_1760608556

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3530
  • Num Input Tokens Seen: 1308280

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4248 1.5045 167 0.7132 65984
0.3972 3.0090 334 0.3704 131096
0.3307 4.5135 501 0.3653 196400
0.3431 6.0180 668 0.3536 261392
0.9528 7.5225 835 0.3538 326864
0.3597 9.0270 1002 0.3559 391800
0.3717 10.5315 1169 0.3559 458568
0.337 12.0360 1336 0.3868 523312
0.3823 13.5405 1503 0.3527 589824
0.3542 15.0450 1670 0.3570 655200
0.3646 16.5495 1837 0.3523 721016
0.347 18.0541 2004 0.3497 787016
0.3508 19.5586 2171 0.3492 853744
0.3377 21.0631 2338 0.3493 918752
0.3542 22.5676 2505 0.3511 984472
0.3427 24.0721 2672 0.3546 1050088
0.3727 25.5766 2839 0.3556 1115632
0.3329 27.0811 3006 0.3515 1181344
0.3513 28.5856 3173 0.3524 1246872

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
130
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_42_1760608556

Adapter
(2015)
this model