zephyr-7b-dpo-qlora

This model is a fine-tuned version of TII-Frontier-Team/falcon3-3b-instruct on the TII-Frontier-Team/Reasoning_DPO dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0299
  • Rewards/chosen: -4.6362
  • Rewards/rejected: -10.4479
  • Rewards/accuracies: 0.9306
  • Rewards/margins: 5.8117
  • Logps/rejected: -1080.7013
  • Logps/chosen: -496.4129
  • Logits/rejected: 2.0470
  • Logits/chosen: 2.2558

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6913 0.0315 100 0.6911 0.0007 -0.0036 0.6220 0.0042 -36.2718 -32.7285 -1.6824 -1.6348
0.6742 0.0629 200 0.6751 0.0003 -0.0454 0.6276 0.0458 -40.4596 -32.7631 -1.5097 -1.4586
0.6081 0.0944 300 0.5872 -0.5193 -0.8644 0.6619 0.3451 -122.3552 -84.7303 -0.4701 -0.3830
0.4463 0.1258 400 0.3978 -2.0312 -3.2212 0.7190 1.1900 -358.0407 -235.9217 -0.3673 -0.2101
0.3548 0.1573 500 0.3048 -2.5142 -4.1605 0.7698 1.6464 -451.9689 -284.2137 0.4417 0.6033
0.3014 0.1887 600 0.2395 -2.7662 -4.8033 0.7963 2.0371 -516.2451 -309.4138 1.0026 1.1670
0.25 0.2202 700 0.1989 -3.1039 -5.4194 0.8235 2.3155 -577.8538 -343.1828 1.3421 1.5051
0.2163 0.2517 800 0.1564 -3.4535 -6.3881 0.8369 2.9346 -674.7255 -378.1511 1.8084 1.9697
0.178 0.2831 900 0.1349 -3.4355 -6.5411 0.8586 3.1056 -690.0276 -376.3503 1.7688 1.9492
0.1736 0.3146 1000 0.1127 -3.5471 -6.9599 0.8668 3.4128 -731.9055 -387.5069 2.0848 2.2440
0.1474 0.3460 1100 0.0982 -3.6177 -7.2322 0.8799 3.6145 -759.1403 -394.5700 1.8280 2.0076
0.1382 0.3775 1200 0.0819 -4.3123 -8.3603 0.8862 4.0480 -871.9455 -464.0287 2.0966 2.2833
0.1133 0.4089 1300 0.0714 -4.0671 -8.3309 0.8955 4.2638 -869.0029 -439.5055 1.9082 2.1044
0.1209 0.4404 1400 0.0634 -4.8366 -9.4739 0.8933 4.6374 -983.3081 -516.4533 2.0574 2.2678
0.1057 0.4718 1500 0.0575 -4.1835 -8.8581 0.9019 4.6746 -921.7241 -451.1488 2.0907 2.2780
0.1057 0.5033 1600 0.0536 -4.2093 -8.9250 0.9131 4.7157 -928.4156 -453.7231 2.0198 2.2136
0.0881 0.5348 1700 0.0490 -4.4577 -9.3694 0.9101 4.9118 -972.8605 -478.5644 1.8760 2.0804
0.0847 0.5662 1800 0.0441 -4.2531 -9.4108 0.9131 5.1578 -977.0005 -458.1054 2.0999 2.2904
0.0713 0.5977 1900 0.0411 -4.4101 -9.6543 0.9168 5.2442 -1001.3448 -473.8065 2.0887 2.2861
0.0553 0.6291 2000 0.0378 -4.9687 -10.5782 0.9123 5.6095 -1093.7402 -529.6686 2.0469 2.2608
0.0668 0.6606 2100 0.0362 -4.7485 -10.3227 0.9190 5.5741 -1068.1823 -507.6488 2.1354 2.3368
0.0528 0.6920 2200 0.0356 -4.6766 -10.2170 0.9175 5.5404 -1057.6173 -500.4605 1.9572 2.1594
0.0596 0.7235 2300 0.0340 -4.6180 -10.2121 0.9235 5.5942 -1057.1299 -494.5929 2.0041 2.2117
0.063 0.7550 2400 0.0328 -4.5357 -10.1876 0.9257 5.6519 -1054.6713 -486.3653 2.1493 2.3488
0.0558 0.7864 2500 0.0311 -4.7155 -10.5680 0.9261 5.8526 -1092.7185 -504.3435 2.1208 2.3275
0.0552 0.8179 2600 0.0312 -4.6574 -10.3658 0.9254 5.7084 -1072.4943 -498.5399 2.0544 2.2592
0.066 0.8493 2700 0.0305 -4.6506 -10.4766 0.9287 5.8259 -1083.5740 -497.8611 2.0914 2.2968
0.0568 0.8808 2800 0.0302 -4.6423 -10.4629 0.9302 5.8206 -1082.2051 -497.0266 2.0957 2.3026
0.0602 0.9122 2900 0.0299 -4.6260 -10.4608 0.9299 5.8348 -1081.9958 -495.3989 2.0861 2.2911
0.0634 0.9437 3000 0.0298 -4.6454 -10.4843 0.9313 5.8389 -1084.3455 -497.3409 2.0655 2.2739
0.0602 0.9751 3100 0.0299 -4.6289 -10.4404 0.9302 5.8116 -1079.9603 -495.6860 2.0537 2.2623

Framework versions

  • PEFT 0.13.0
  • Transformers 4.45.1
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.0
Downloads last month
32
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for RedaAlami/zephyr-7b-dpo-qlora

Adapter
(1)
this model