train_wic_1752870509

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3412
  • Num Input Tokens Seen: 4213808

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
3.1888 0.5 611 3.0427 210240
0.4678 1.0 1222 0.4648 421528
0.356 1.5 1833 0.3837 632632
0.3658 2.0 2444 0.3662 843368
0.3545 2.5 3055 0.3566 1054024
0.3267 3.0 3666 0.3541 1264408
0.3546 3.5 4277 0.3530 1475000
0.3166 4.0 4888 0.3487 1685768
0.3747 4.5 5499 0.3450 1895752
0.3684 5.0 6110 0.3437 2106968
0.3188 5.5 6721 0.3423 2318136
0.3403 6.0 7332 0.3435 2528648
0.3211 6.5 7943 0.3416 2739720
0.3119 7.0 8554 0.3412 2949592
0.3343 7.5 9165 0.3419 3160056
0.2871 8.0 9776 0.3424 3371056
0.3477 8.5 10387 0.3440 3581616
0.3369 9.0 10998 0.3420 3792672
0.337 9.5 11609 0.3424 4003136
0.3049 10.0 12220 0.3426 4213808

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.7.1+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_1752870509

Adapter
(971)
this model