Ministral-8B-Instruct-2410-dpo-llama-1000
This model is a fine-tuned version of mistralai/Ministral-8B-Instruct-2410 on the answer_llama dataset. It achieves the following results on the evaluation set:
- Loss: 0.2740
- Rewards/chosen: 0.9492
- Rewards/rejected: -1.3563
- Rewards/accuracies: 0.8900
- Rewards/margins: 2.3055
- Logps/chosen: -24.6582
- Logps/rejected: -48.3533
- Logits/chosen: -1.2736
- Logits/rejected: -1.4719
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/chosen | Logps/rejected | Logits/chosen | Logits/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6556 | 0.8889 | 50 | 0.6256 | 0.1404 | -0.0054 | 0.8100 | 0.1459 | -32.7462 | -34.8450 | -1.7657 | -1.8272 |
0.4187 | 1.7778 | 100 | 0.3967 | 0.6694 | -0.2649 | 0.8400 | 0.9344 | -27.4562 | -37.4399 | -1.5767 | -1.6947 |
0.2758 | 2.6667 | 150 | 0.3213 | 0.7620 | -0.7877 | 0.8600 | 1.5497 | -26.5309 | -42.6681 | -1.4150 | -1.5803 |
0.2583 | 3.5556 | 200 | 0.2799 | 0.8856 | -1.1319 | 0.8900 | 2.0176 | -25.2941 | -46.1100 | -1.3446 | -1.5362 |
0.2338 | 4.4444 | 250 | 0.2740 | 0.9492 | -1.3563 | 0.8900 | 2.3055 | -24.6582 | -48.3533 | -1.2736 | -1.4719 |
0.2264 | 5.3333 | 300 | 0.2748 | 0.9422 | -1.6000 | 0.8800 | 2.5422 | -24.7285 | -50.7910 | -1.2476 | -1.4447 |
0.1735 | 6.2222 | 350 | 0.2817 | 0.8792 | -1.9250 | 0.8700 | 2.8042 | -25.3584 | -54.0402 | -1.2022 | -1.4030 |
0.1834 | 7.1111 | 400 | 0.2900 | 0.8156 | -2.1377 | 0.8800 | 2.9533 | -25.9941 | -56.1677 | -1.1777 | -1.3806 |
0.1661 | 8.0 | 450 | 0.2968 | 0.7723 | -2.2626 | 0.8900 | 3.0349 | -26.4276 | -57.4162 | -1.1686 | -1.3688 |
0.1377 | 8.8889 | 500 | 0.2971 | 0.7689 | -2.2991 | 0.8900 | 3.0680 | -26.4618 | -57.7814 | -1.1676 | -1.3687 |
0.1939 | 9.7778 | 550 | 0.2977 | 0.7798 | -2.2870 | 0.8900 | 3.0668 | -26.3530 | -57.6608 | -1.1677 | -1.3691 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for chchen/Ministral-8B-Instruct-2410-dpo-llama-1000
Base model
mistralai/Ministral-8B-Instruct-2410