mistral-dpo

This model is a fine-tuned version of TheBloke/Mistral-7B-v0.1-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0000
  • Rewards/chosen: -2.0502
  • Rewards/rejected: -28.3632
  • Rewards/accuracies: 1.0
  • Rewards/margins: 26.3129
  • Logps/rejected: -399.8283
  • Logps/chosen: -35.7179
  • Logits/rejected: -2.1171
  • Logits/chosen: -1.8480

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 250
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6453 0.2 10 0.4086 0.1393 -0.7001 1.0 0.8394 -123.1976 -13.8225 -2.5461 -2.5162
0.1759 0.4 20 0.0051 0.3963 -6.4413 1.0 6.8376 -180.6101 -11.2527 -2.5253 -2.4045
0.0015 0.6 30 0.0000 0.2885 -20.7441 1.0 21.0326 -323.6376 -12.3309 -2.2440 -1.8851
0.0 0.8 40 0.0000 -0.6913 -26.5964 1.0 25.9051 -382.1607 -22.1282 -1.9054 -1.5507
0.0 1.0 50 0.0000 -1.6661 -28.8376 1.0 27.1715 -404.5731 -31.8766 -1.7581 -1.4145
0.0 1.2 60 0.0000 -2.1659 -29.6823 1.0 27.5164 -413.0200 -36.8745 -1.7071 -1.3649
0.0 1.4 70 0.0000 -2.0973 -30.0476 1.0 27.9503 -416.6729 -36.1886 -1.6955 -1.3541
0.0 1.6 80 0.0000 -2.0065 -30.1726 1.0 28.1661 -417.9230 -35.2805 -1.6941 -1.3519
0.0 1.8 90 0.0000 -1.9541 -30.2266 1.0 28.2724 -418.4622 -34.7568 -1.6935 -1.3518
0.0023 2.0 100 0.0000 -0.7061 -30.2814 1.0 29.5753 -419.0107 -22.2763 -1.7664 -1.4215
0.0 2.2 110 0.0000 -1.6234 -29.4682 1.0 27.8448 -410.8783 -31.4494 -2.0371 -1.7164
0.0 2.4 120 0.0000 -1.9528 -28.6154 1.0 26.6626 -402.3507 -34.7431 -2.0991 -1.8126
0.0 2.6 130 0.0000 -2.0210 -28.3739 1.0 26.3529 -399.9358 -35.4253 -2.1141 -1.8394
0.0 2.8 140 0.0000 -2.0443 -28.2878 1.0 26.2435 -399.0752 -35.6588 -2.1185 -1.8487
0.0 3.0 150 0.0000 -2.0504 -28.2651 1.0 26.2147 -398.8474 -35.7192 -2.1201 -1.8510
0.0 3.2 160 0.0000 -2.0500 -28.2657 1.0 26.2157 -398.8541 -35.7157 -2.1202 -1.8519
0.0 3.4 170 0.0000 -2.0530 -28.2687 1.0 26.2157 -398.8837 -35.7460 -2.1205 -1.8521
0.0 3.6 180 0.0000 -2.0529 -28.2660 1.0 26.2131 -398.8570 -35.7444 -2.1202 -1.8515
0.0 3.8 190 0.0000 -2.0531 -28.2649 1.0 26.2119 -398.8461 -35.7464 -2.1202 -1.8519
0.0 4.0 200 0.0000 -2.0579 -28.3150 1.0 26.2571 -399.3466 -35.7943 -2.1191 -1.8507
0.0 4.2 210 0.0000 -2.0509 -28.3341 1.0 26.2832 -399.5381 -35.7246 -2.1178 -1.8487
0.0 4.4 220 0.0000 -2.0516 -28.3405 1.0 26.2889 -399.6018 -35.7316 -2.1178 -1.8490
0.0 4.6 230 0.0000 -2.0516 -28.3495 1.0 26.2979 -399.6917 -35.7317 -2.1176 -1.8489
0.0 4.8 240 0.0000 -2.0508 -28.3684 1.0 26.3176 -399.8806 -35.7236 -2.1173 -1.8488
0.0 5.0 250 0.0000 -2.0502 -28.3632 1.0 26.3129 -399.8283 -35.7179 -2.1171 -1.8480

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.0.1+cu118
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
4
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for AlbelTec/mistral-dpo-old

Adapter
(27)
this model