Ministral-8B-Instruct-2410-dpo-llama-1000

This model is a fine-tuned version of mistralai/Ministral-8B-Instruct-2410 on the answer_llama dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2740
  • Rewards/chosen: 0.9492
  • Rewards/rejected: -1.3563
  • Rewards/accuracies: 0.8900
  • Rewards/margins: 2.3055
  • Logps/chosen: -24.6582
  • Logps/rejected: -48.3533
  • Logits/chosen: -1.2736
  • Logits/rejected: -1.4719

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
0.6556 0.8889 50 0.6256 0.1404 -0.0054 0.8100 0.1459 -32.7462 -34.8450 -1.7657 -1.8272
0.4187 1.7778 100 0.3967 0.6694 -0.2649 0.8400 0.9344 -27.4562 -37.4399 -1.5767 -1.6947
0.2758 2.6667 150 0.3213 0.7620 -0.7877 0.8600 1.5497 -26.5309 -42.6681 -1.4150 -1.5803
0.2583 3.5556 200 0.2799 0.8856 -1.1319 0.8900 2.0176 -25.2941 -46.1100 -1.3446 -1.5362
0.2338 4.4444 250 0.2740 0.9492 -1.3563 0.8900 2.3055 -24.6582 -48.3533 -1.2736 -1.4719
0.2264 5.3333 300 0.2748 0.9422 -1.6000 0.8800 2.5422 -24.7285 -50.7910 -1.2476 -1.4447
0.1735 6.2222 350 0.2817 0.8792 -1.9250 0.8700 2.8042 -25.3584 -54.0402 -1.2022 -1.4030
0.1834 7.1111 400 0.2900 0.8156 -2.1377 0.8800 2.9533 -25.9941 -56.1677 -1.1777 -1.3806
0.1661 8.0 450 0.2968 0.7723 -2.2626 0.8900 3.0349 -26.4276 -57.4162 -1.1686 -1.3688
0.1377 8.8889 500 0.2971 0.7689 -2.2991 0.8900 3.0680 -26.4618 -57.7814 -1.1676 -1.3687
0.1939 9.7778 550 0.2977 0.7798 -2.2870 0.8900 3.0668 -26.3530 -57.6608 -1.1677 -1.3691

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for chchen/Ministral-8B-Instruct-2410-dpo-llama-1000

Adapter
(69)
this model