train_wic_1753094170

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2390
  • Num Input Tokens Seen: 4213808

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3829 0.5 611 0.3071 210240
0.1971 1.0 1222 0.2813 421528
0.2298 1.5 1833 0.2601 632632
0.1462 2.0 2444 0.2633 843368
0.4035 2.5 3055 0.2592 1054024
0.1821 3.0 3666 0.2520 1264408
0.1771 3.5 4277 0.2593 1475000
0.164 4.0 4888 0.2590 1685768
0.1471 4.5 5499 0.2440 1895752
0.4103 5.0 6110 0.2552 2106968
0.4478 5.5 6721 0.2485 2318136
0.1788 6.0 7332 0.2390 2528648
0.2565 6.5 7943 0.2556 2739720
0.2295 7.0 8554 0.2675 2949592
0.2099 7.5 9165 0.2640 3160056
0.2858 8.0 9776 0.2716 3371056
0.2834 8.5 10387 0.2651 3581616
0.1544 9.0 10998 0.2651 3792672
0.1857 9.5 11609 0.2673 4003136
0.0746 10.0 12220 0.2687 4213808

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.7.1+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_1753094170

Adapter
(971)
this model