train_wsc_123_1760359717

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3485
  • Num Input Tokens Seen: 1465808

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.03
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4306 1.504 188 0.5445 73760
0.4609 3.008 376 0.3706 148032
0.3559 4.5120 564 0.4820 222944
0.3778 6.016 752 0.3521 294320
0.3463 7.52 940 0.3597 369248
0.3261 9.024 1128 0.3840 442000
0.3388 10.528 1316 0.3828 516624
0.3685 12.032 1504 0.3522 589072
0.3655 13.536 1692 0.3485 662256
0.3283 15.04 1880 0.3584 736272
0.3504 16.544 2068 0.3530 809824
0.3531 18.048 2256 0.3558 882480
0.345 19.552 2444 0.3545 956000
0.3476 21.056 2632 0.3646 1028736
0.3309 22.56 2820 0.3579 1102672
0.3495 24.064 3008 0.3490 1176448
0.3661 25.568 3196 0.3524 1249968
0.3308 27.072 3384 0.3528 1322608
0.3481 28.576 3572 0.3514 1396032

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.8.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
137
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_123_1760359717

Adapter
(2014)
this model