File size: 5,782 Bytes
de615c6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: mistralit2_1000_STEPS_1e7_rate_03_beta_DPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistralit2_1000_STEPS_1e7_rate_03_beta_DPO

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6191
- Rewards/chosen: -1.8431
- Rewards/rejected: -2.7054
- Rewards/accuracies: 0.6505
- Rewards/margins: 0.8623
- Logps/rejected: -37.5904
- Logps/chosen: -29.5295
- Logits/rejected: -2.8238
- Logits/chosen: -2.8242

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6777        | 0.1   | 50   | 0.6740          | -0.1496        | -0.1942          | 0.5824             | 0.0446          | -29.2197       | -23.8845     | -2.8632         | -2.8635       |
| 0.6077        | 0.2   | 100  | 0.6364          | -1.2703        | -1.6253          | 0.5846             | 0.3550          | -33.9902       | -27.6202     | -2.8384         | -2.8387       |
| 0.4959        | 0.29  | 150  | 0.6488          | -2.0038        | -2.5512          | 0.5934             | 0.5473          | -37.0763       | -30.0653     | -2.8343         | -2.8347       |
| 0.553         | 0.39  | 200  | 0.5977          | -0.9571        | -1.3986          | 0.6374             | 0.4415          | -33.2344       | -26.5762     | -2.8518         | -2.8521       |
| 0.6334        | 0.49  | 250  | 0.5740          | -0.6757        | -1.1710          | 0.6440             | 0.4953          | -32.4758       | -25.6382     | -2.8479         | -2.8482       |
| 0.5613        | 0.59  | 300  | 0.5961          | -1.4901        | -2.1568          | 0.6374             | 0.6666          | -35.7616       | -28.3529     | -2.8436         | -2.8439       |
| 0.5182        | 0.68  | 350  | 0.6175          | -1.8099        | -2.5639          | 0.6418             | 0.7541          | -37.1189       | -29.4187     | -2.8403         | -2.8407       |
| 0.6292        | 0.78  | 400  | 0.6197          | -1.8949        | -2.6751          | 0.6418             | 0.7802          | -37.4896       | -29.7022     | -2.8352         | -2.8356       |
| 0.6529        | 0.88  | 450  | 0.5986          | -1.3908        | -2.0689          | 0.6527             | 0.6781          | -35.4687       | -28.0218     | -2.8394         | -2.8398       |
| 0.5042        | 0.98  | 500  | 0.5930          | -1.2223        | -1.8903          | 0.6637             | 0.6680          | -34.8735       | -27.4602     | -2.8391         | -2.8395       |
| 0.364         | 1.07  | 550  | 0.5917          | -1.3579        | -2.0905          | 0.6659             | 0.7327          | -35.5409       | -27.9120     | -2.8340         | -2.8344       |
| 0.346         | 1.17  | 600  | 0.6084          | -1.6411        | -2.4313          | 0.6527             | 0.7903          | -36.6769       | -28.8561     | -2.8286         | -2.8291       |
| 0.4524        | 1.27  | 650  | 0.6120          | -1.7303        | -2.5496          | 0.6484             | 0.8192          | -37.0710       | -29.1536     | -2.8265         | -2.8269       |
| 0.3422        | 1.37  | 700  | 0.6172          | -1.7895        | -2.6271          | 0.6505             | 0.8376          | -37.3293       | -29.3507     | -2.8252         | -2.8257       |
| 0.2776        | 1.46  | 750  | 0.6164          | -1.8100        | -2.6641          | 0.6462             | 0.8541          | -37.4528       | -29.4193     | -2.8245         | -2.8249       |
| 0.3599        | 1.56  | 800  | 0.6201          | -1.8360        | -2.6887          | 0.6484             | 0.8527          | -37.5348       | -29.5057     | -2.8241         | -2.8246       |
| 0.4059        | 1.66  | 850  | 0.6205          | -1.8421        | -2.6971          | 0.6440             | 0.8550          | -37.5629       | -29.5263     | -2.8241         | -2.8246       |
| 0.3417        | 1.76  | 900  | 0.6190          | -1.8389        | -2.6983          | 0.6505             | 0.8594          | -37.5666       | -29.5155     | -2.8239         | -2.8243       |
| 0.3409        | 1.86  | 950  | 0.6195          | -1.8423        | -2.7030          | 0.6484             | 0.8606          | -37.5823       | -29.5270     | -2.8237         | -2.8242       |
| 0.2802        | 1.95  | 1000 | 0.6191          | -1.8431        | -2.7054          | 0.6505             | 0.8623          | -37.5904       | -29.5295     | -2.8238         | -2.8242       |


### Framework versions

- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2