Llama-3.1-8B-Instruct-dpo-llama-1000

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the answer_llama dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3077
  • Rewards/chosen: 1.4814
  • Rewards/rejected: -0.7600
  • Rewards/accuracies: 0.8500
  • Rewards/margins: 2.2414
  • Logps/chosen: -7.6796
  • Logps/rejected: -31.9936
  • Logits/chosen: -0.2154
  • Logits/rejected: -0.3106

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
0.6815 0.8889 50 0.6707 0.0833 0.0353 0.6900 0.0480 -21.6601 -24.0398 -0.4114 -0.4792
0.5082 1.7778 100 0.4428 1.0308 0.1943 0.7900 0.8366 -12.1855 -22.4506 -0.3559 -0.4377
0.2979 2.6667 150 0.3215 1.3481 -0.4170 0.8600 1.7651 -9.0131 -28.5637 -0.2695 -0.3655
0.2862 3.5556 200 0.3077 1.4814 -0.7600 0.8500 2.2414 -7.6796 -31.9936 -0.2154 -0.3106
0.2747 4.4444 250 0.3184 1.4147 -1.2445 0.8600 2.6592 -8.3466 -36.8385 -0.1872 -0.2879
0.2688 5.3333 300 0.3195 1.4469 -1.2794 0.8500 2.7263 -8.0242 -37.1874 -0.1714 -0.2705
0.2047 6.2222 350 0.3630 1.3019 -1.5956 0.8400 2.8975 -9.4749 -40.3495 -0.1553 -0.2578
0.2268 7.1111 400 0.3526 1.3609 -1.6635 0.8500 3.0245 -8.8842 -41.0287 -0.1452 -0.2479
0.144 8.0 450 0.3662 1.3488 -1.7032 0.8400 3.0520 -9.0059 -41.4255 -0.1421 -0.2448
0.171 8.8889 500 0.3635 1.3313 -1.7326 0.8400 3.0640 -9.1805 -41.7197 -0.1399 -0.2430
0.2313 9.7778 550 0.3613 1.3392 -1.7432 0.8400 3.0824 -9.1017 -41.8256 -0.1378 -0.2410

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for chchen/Llama-3.1-8B-Instruct-dpo-llama-1000

Adapter
(922)
this model