Llama-3.1-8B-Instruct-dpo-mistral-1000

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the answer_mistral dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4675
  • Rewards/chosen: 0.9903
  • Rewards/rejected: -0.3997
  • Rewards/accuracies: 0.7900
  • Rewards/margins: 1.3900
  • Logps/chosen: -13.2488
  • Logps/rejected: -29.2269
  • Logits/chosen: -0.1396
  • Logits/rejected: -0.2080

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
0.6891 0.8909 50 0.6833 0.0487 0.0276 0.6200 0.0211 -22.6647 -24.9535 -0.3207 -0.3690
0.5716 1.7817 100 0.5618 0.6081 0.1913 0.7000 0.4168 -17.0706 -23.3165 -0.2934 -0.3456
0.4581 2.6726 150 0.4761 0.9362 -0.0437 0.7600 0.9799 -13.7892 -25.6666 -0.2093 -0.2739
0.4032 3.5635 200 0.4709 0.9603 -0.2844 0.8100 1.2447 -13.5486 -28.0732 -0.1631 -0.2306
0.3836 4.4543 250 0.4675 0.9903 -0.3997 0.7900 1.3900 -13.2488 -29.2269 -0.1396 -0.2080
0.3588 5.3452 300 0.4752 0.9745 -0.4525 0.7700 1.4270 -13.4066 -29.7545 -0.1255 -0.1931
0.2861 6.2361 350 0.4812 0.9392 -0.5503 0.7700 1.4895 -13.7591 -30.7320 -0.1102 -0.1785
0.3662 7.1269 400 0.4868 0.9165 -0.6356 0.7700 1.5522 -13.9862 -31.5858 -0.0990 -0.1679
0.2822 8.0178 450 0.4927 0.9099 -0.6512 0.7600 1.5612 -14.0519 -31.7416 -0.0936 -0.1622
0.2416 8.9087 500 0.4979 0.8912 -0.6958 0.7600 1.5870 -14.2398 -32.1878 -0.0898 -0.1585
0.3096 9.7996 550 0.4934 0.8943 -0.7017 0.75 1.5960 -14.2081 -32.2463 -0.0873 -0.1548

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for chchen/Llama-3.1-8B-Instruct-dpo-mistral-1000

Adapter
(922)
this model