Hyrros commited on
Commit
6a42e9f
·
verified ·
1 Parent(s): d1f2207

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ base_model: EleutherAI/gpt-neo-1.3B
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.11.1
adapter_config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "EleutherAI/gpt-neo-1.3B",
5
+ "bias": "none",
6
+ "fan_in_fan_out": false,
7
+ "inference_mode": true,
8
+ "init_lora_weights": true,
9
+ "layer_replication": null,
10
+ "layers_pattern": null,
11
+ "layers_to_transform": null,
12
+ "loftq_config": {},
13
+ "lora_alpha": 16,
14
+ "lora_dropout": 0.05,
15
+ "megatron_config": null,
16
+ "megatron_core": "megatron.core",
17
+ "modules_to_save": null,
18
+ "peft_type": "LORA",
19
+ "r": 8,
20
+ "rank_pattern": {},
21
+ "revision": null,
22
+ "target_modules": [
23
+ "q_proj",
24
+ "v_proj"
25
+ ],
26
+ "task_type": "CAUSAL_LM",
27
+ "use_dora": false,
28
+ "use_rslora": false
29
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:faf7dc0d6803e54ac695452a94205faa12c59a6d39e451e860a7fe7bd4c3f376
3
+ size 6304672
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c6f20e2c741f6e1bad07240d6f427dff6953798f55a5f28bbd9e82b366e5051
3
+ size 12639866
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:255469ff8e28d3da85ac42b5533987c2c22bb90cd9a40ecf72b4c43f0cee3b86
3
+ size 14180
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5bed59e7621074ed10861e846fee0b9c04766c38171474c32d31bb7f8dd365a
3
+ size 1064
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|endoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|endoftext|>",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "50256": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ }
13
+ },
14
+ "bos_token": "<|endoftext|>",
15
+ "clean_up_tokenization_spaces": true,
16
+ "eos_token": "<|endoftext|>",
17
+ "errors": "replace",
18
+ "model_max_length": 2048,
19
+ "pad_token": "<|endoftext|>",
20
+ "tokenizer_class": "GPT2Tokenizer",
21
+ "unk_token": "<|endoftext|>"
22
+ }
trainer_state.json ADDED
@@ -0,0 +1,435 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.8357621110673494,
5
+ "eval_steps": 200,
6
+ "global_step": 1800,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.15754233950374164,
13
+ "grad_norm": 1.364139199256897,
14
+ "learning_rate": 0.00025,
15
+ "logits/chosen": -19.665550231933594,
16
+ "logits/rejected": -18.70989227294922,
17
+ "logps/chosen": -357.2802734375,
18
+ "logps/rejected": -267.8329162597656,
19
+ "loss": 0.498,
20
+ "rewards/accuracies": 0.7437499761581421,
21
+ "rewards/chosen": 0.69869065284729,
22
+ "rewards/margins": 0.8159563541412354,
23
+ "rewards/rejected": -0.1172657385468483,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.31508467900748327,
28
+ "grad_norm": 0.2697848677635193,
29
+ "learning_rate": 0.0005,
30
+ "logits/chosen": -19.965831756591797,
31
+ "logits/rejected": -18.864395141601562,
32
+ "logps/chosen": -340.5800476074219,
33
+ "logps/rejected": -273.5838623046875,
34
+ "loss": 0.242,
35
+ "rewards/accuracies": 0.8662499785423279,
36
+ "rewards/chosen": 1.5408194065093994,
37
+ "rewards/margins": 3.1778359413146973,
38
+ "rewards/rejected": -1.63701593875885,
39
+ "step": 200
40
+ },
41
+ {
42
+ "epoch": 0.31508467900748327,
43
+ "eval_logits/chosen": -16.780941009521484,
44
+ "eval_logits/rejected": -16.480772018432617,
45
+ "eval_logps/chosen": -389.2388000488281,
46
+ "eval_logps/rejected": -293.8738708496094,
47
+ "eval_loss": 0.23645737767219543,
48
+ "eval_rewards/accuracies": 0.9711538553237915,
49
+ "eval_rewards/chosen": 0.6156416535377502,
50
+ "eval_rewards/margins": 1.6693669557571411,
51
+ "eval_rewards/rejected": -1.0537253618240356,
52
+ "eval_runtime": 99.0189,
53
+ "eval_samples_per_second": 1.05,
54
+ "eval_steps_per_second": 0.525,
55
+ "step": 200
56
+ },
57
+ {
58
+ "epoch": 0.4726270185112249,
59
+ "grad_norm": 0.3729426860809326,
60
+ "learning_rate": 0.0004957532446941012,
61
+ "logits/chosen": -19.795560836791992,
62
+ "logits/rejected": -18.869535446166992,
63
+ "logps/chosen": -349.1313781738281,
64
+ "logps/rejected": -293.3199157714844,
65
+ "loss": 0.2098,
66
+ "rewards/accuracies": 0.8824999928474426,
67
+ "rewards/chosen": 1.0055174827575684,
68
+ "rewards/margins": 4.656251907348633,
69
+ "rewards/rejected": -3.6507339477539062,
70
+ "step": 300
71
+ },
72
+ {
73
+ "epoch": 0.6301693580149665,
74
+ "grad_norm": 0.24799980223178864,
75
+ "learning_rate": 0.00048315725822143025,
76
+ "logits/chosen": -19.78338050842285,
77
+ "logits/rejected": -18.67037582397461,
78
+ "logps/chosen": -345.9194641113281,
79
+ "logps/rejected": -296.3516845703125,
80
+ "loss": 0.1816,
81
+ "rewards/accuracies": 0.8987500071525574,
82
+ "rewards/chosen": 1.1964439153671265,
83
+ "rewards/margins": 5.297502517700195,
84
+ "rewards/rejected": -4.101058483123779,
85
+ "step": 400
86
+ },
87
+ {
88
+ "epoch": 0.6301693580149665,
89
+ "eval_logits/chosen": -16.3292293548584,
90
+ "eval_logits/rejected": -16.04510498046875,
91
+ "eval_logps/chosen": -398.3055114746094,
92
+ "eval_logps/rejected": -319.1045227050781,
93
+ "eval_loss": 0.08102616667747498,
94
+ "eval_rewards/accuracies": 0.9807692170143127,
95
+ "eval_rewards/chosen": -0.291032999753952,
96
+ "eval_rewards/margins": 3.2857601642608643,
97
+ "eval_rewards/rejected": -3.576793670654297,
98
+ "eval_runtime": 99.1533,
99
+ "eval_samples_per_second": 1.049,
100
+ "eval_steps_per_second": 0.524,
101
+ "step": 400
102
+ },
103
+ {
104
+ "epoch": 0.7877116975187082,
105
+ "grad_norm": 1.1599677801132202,
106
+ "learning_rate": 0.0004626399771610739,
107
+ "logits/chosen": -19.652454376220703,
108
+ "logits/rejected": -18.740825653076172,
109
+ "logps/chosen": -342.26641845703125,
110
+ "logps/rejected": -312.0155334472656,
111
+ "loss": 0.2049,
112
+ "rewards/accuracies": 0.8799999952316284,
113
+ "rewards/chosen": 0.3767644166946411,
114
+ "rewards/margins": 5.342309951782227,
115
+ "rewards/rejected": -4.965545654296875,
116
+ "step": 500
117
+ },
118
+ {
119
+ "epoch": 0.9452540370224498,
120
+ "grad_norm": 1.1663849353790283,
121
+ "learning_rate": 0.00043489845649067753,
122
+ "logits/chosen": -20.011140823364258,
123
+ "logits/rejected": -19.01030731201172,
124
+ "logps/chosen": -350.619873046875,
125
+ "logps/rejected": -307.1583251953125,
126
+ "loss": 0.1641,
127
+ "rewards/accuracies": 0.8962500095367432,
128
+ "rewards/chosen": 0.45414870977401733,
129
+ "rewards/margins": 5.876157283782959,
130
+ "rewards/rejected": -5.4220075607299805,
131
+ "step": 600
132
+ },
133
+ {
134
+ "epoch": 0.9452540370224498,
135
+ "eval_logits/chosen": -16.64271354675293,
136
+ "eval_logits/rejected": -16.393213272094727,
137
+ "eval_logps/chosen": -397.2878723144531,
138
+ "eval_logps/rejected": -322.0921630859375,
139
+ "eval_loss": 0.04721178486943245,
140
+ "eval_rewards/accuracies": 1.0,
141
+ "eval_rewards/chosen": -0.18926361203193665,
142
+ "eval_rewards/margins": 3.686293125152588,
143
+ "eval_rewards/rejected": -3.8755569458007812,
144
+ "eval_runtime": 99.0866,
145
+ "eval_samples_per_second": 1.05,
146
+ "eval_steps_per_second": 0.525,
147
+ "step": 600
148
+ },
149
+ {
150
+ "epoch": 1.1027963765261914,
151
+ "grad_norm": 0.13236922025680542,
152
+ "learning_rate": 0.000400875187811047,
153
+ "logits/chosen": -19.676679611206055,
154
+ "logits/rejected": -18.6084041595459,
155
+ "logps/chosen": -354.6936340332031,
156
+ "logps/rejected": -311.38226318359375,
157
+ "loss": 0.1031,
158
+ "rewards/accuracies": 0.949999988079071,
159
+ "rewards/chosen": 0.7313293218612671,
160
+ "rewards/margins": 6.36898946762085,
161
+ "rewards/rejected": -5.637660026550293,
162
+ "step": 700
163
+ },
164
+ {
165
+ "epoch": 1.260338716029933,
166
+ "grad_norm": 0.15625274181365967,
167
+ "learning_rate": 0.00036172607909649605,
168
+ "logits/chosen": -19.61475944519043,
169
+ "logits/rejected": -18.85614776611328,
170
+ "logps/chosen": -354.2134704589844,
171
+ "logps/rejected": -341.8444519042969,
172
+ "loss": 0.0554,
173
+ "rewards/accuracies": 0.9825000166893005,
174
+ "rewards/chosen": -0.4523475766181946,
175
+ "rewards/margins": 7.603002548217773,
176
+ "rewards/rejected": -8.055350303649902,
177
+ "step": 800
178
+ },
179
+ {
180
+ "epoch": 1.260338716029933,
181
+ "eval_logits/chosen": -16.010990142822266,
182
+ "eval_logits/rejected": -15.788914680480957,
183
+ "eval_logps/chosen": -408.9882507324219,
184
+ "eval_logps/rejected": -353.1039733886719,
185
+ "eval_loss": 0.01217043399810791,
186
+ "eval_rewards/accuracies": 1.0,
187
+ "eval_rewards/chosen": -1.3593038320541382,
188
+ "eval_rewards/margins": 5.617432594299316,
189
+ "eval_rewards/rejected": -6.976736068725586,
190
+ "eval_runtime": 99.0253,
191
+ "eval_samples_per_second": 1.05,
192
+ "eval_steps_per_second": 0.525,
193
+ "step": 800
194
+ },
195
+ {
196
+ "epoch": 1.4178810555336747,
197
+ "grad_norm": 0.22375567257404327,
198
+ "learning_rate": 0.00031878118382826264,
199
+ "logits/chosen": -19.352890014648438,
200
+ "logits/rejected": -18.583425521850586,
201
+ "logps/chosen": -359.9408264160156,
202
+ "logps/rejected": -340.4473571777344,
203
+ "loss": 0.0582,
204
+ "rewards/accuracies": 0.9762499928474426,
205
+ "rewards/chosen": -0.5718154907226562,
206
+ "rewards/margins": 7.750644683837891,
207
+ "rewards/rejected": -8.322461128234863,
208
+ "step": 900
209
+ },
210
+ {
211
+ "epoch": 1.5754233950374164,
212
+ "grad_norm": 1.0517126321792603,
213
+ "learning_rate": 0.00027349951370107985,
214
+ "logits/chosen": -19.330745697021484,
215
+ "logits/rejected": -18.550424575805664,
216
+ "logps/chosen": -373.06610107421875,
217
+ "logps/rejected": -360.41961669921875,
218
+ "loss": 0.0741,
219
+ "rewards/accuracies": 0.9637500047683716,
220
+ "rewards/chosen": -1.6150453090667725,
221
+ "rewards/margins": 7.987229824066162,
222
+ "rewards/rejected": -9.602275848388672,
223
+ "step": 1000
224
+ },
225
+ {
226
+ "epoch": 1.5754233950374164,
227
+ "eval_logits/chosen": -15.608610153198242,
228
+ "eval_logits/rejected": -15.47758674621582,
229
+ "eval_logps/chosen": -430.04034423828125,
230
+ "eval_logps/rejected": -373.8860778808594,
231
+ "eval_loss": 0.022561371326446533,
232
+ "eval_rewards/accuracies": 1.0,
233
+ "eval_rewards/chosen": -3.464510440826416,
234
+ "eval_rewards/margins": 5.590437889099121,
235
+ "eval_rewards/rejected": -9.054947853088379,
236
+ "eval_runtime": 99.1814,
237
+ "eval_samples_per_second": 1.049,
238
+ "eval_steps_per_second": 0.524,
239
+ "step": 1000
240
+ },
241
+ {
242
+ "epoch": 1.732965734541158,
243
+ "grad_norm": 0.2072581648826599,
244
+ "learning_rate": 0.00022741947009792817,
245
+ "logits/chosen": -19.062545776367188,
246
+ "logits/rejected": -18.211511611938477,
247
+ "logps/chosen": -378.5089111328125,
248
+ "logps/rejected": -351.533203125,
249
+ "loss": 0.0623,
250
+ "rewards/accuracies": 0.9775000214576721,
251
+ "rewards/chosen": -1.9905846118927002,
252
+ "rewards/margins": 7.95173454284668,
253
+ "rewards/rejected": -9.9423189163208,
254
+ "step": 1100
255
+ },
256
+ {
257
+ "epoch": 1.8905080740448996,
258
+ "grad_norm": 0.19075140357017517,
259
+ "learning_rate": 0.00018210657837614962,
260
+ "logits/chosen": -19.65765953063965,
261
+ "logits/rejected": -18.76839256286621,
262
+ "logps/chosen": -361.4976501464844,
263
+ "logps/rejected": -350.03338623046875,
264
+ "loss": 0.052,
265
+ "rewards/accuracies": 0.9800000190734863,
266
+ "rewards/chosen": -0.5509209632873535,
267
+ "rewards/margins": 8.345035552978516,
268
+ "rewards/rejected": -8.895956039428711,
269
+ "step": 1200
270
+ },
271
+ {
272
+ "epoch": 1.8905080740448996,
273
+ "eval_logits/chosen": -16.13081169128418,
274
+ "eval_logits/rejected": -15.851374626159668,
275
+ "eval_logps/chosen": -403.6054992675781,
276
+ "eval_logps/rejected": -352.6169738769531,
277
+ "eval_loss": 0.015062261372804642,
278
+ "eval_rewards/accuracies": 1.0,
279
+ "eval_rewards/chosen": -0.8210276961326599,
280
+ "eval_rewards/margins": 6.107009410858154,
281
+ "eval_rewards/rejected": -6.928036689758301,
282
+ "eval_runtime": 99.1951,
283
+ "eval_samples_per_second": 1.048,
284
+ "eval_steps_per_second": 0.524,
285
+ "step": 1200
286
+ },
287
+ {
288
+ "epoch": 2.048050413548641,
289
+ "grad_norm": 0.04302853345870972,
290
+ "learning_rate": 0.00013910030064250462,
291
+ "logits/chosen": -19.43883514404297,
292
+ "logits/rejected": -18.620824813842773,
293
+ "logps/chosen": -365.1725769042969,
294
+ "logps/rejected": -346.1496887207031,
295
+ "loss": 0.0357,
296
+ "rewards/accuracies": 0.9887499809265137,
297
+ "rewards/chosen": -0.8485901355743408,
298
+ "rewards/margins": 8.334707260131836,
299
+ "rewards/rejected": -9.183298110961914,
300
+ "step": 1300
301
+ },
302
+ {
303
+ "epoch": 2.205592753052383,
304
+ "grad_norm": 0.2691422700881958,
305
+ "learning_rate": 9.986173400221197e-05,
306
+ "logits/chosen": -19.500974655151367,
307
+ "logits/rejected": -18.651748657226562,
308
+ "logps/chosen": -360.99383544921875,
309
+ "logps/rejected": -359.66326904296875,
310
+ "loss": 0.0097,
311
+ "rewards/accuracies": 0.9950000047683716,
312
+ "rewards/chosen": -1.2903647422790527,
313
+ "rewards/margins": 9.181097984313965,
314
+ "rewards/rejected": -10.471461296081543,
315
+ "step": 1400
316
+ },
317
+ {
318
+ "epoch": 2.205592753052383,
319
+ "eval_logits/chosen": -15.852449417114258,
320
+ "eval_logits/rejected": -15.591791152954102,
321
+ "eval_logps/chosen": -420.1238708496094,
322
+ "eval_logps/rejected": -374.8275146484375,
323
+ "eval_loss": 0.011651669628918171,
324
+ "eval_rewards/accuracies": 1.0,
325
+ "eval_rewards/chosen": -2.4728617668151855,
326
+ "eval_rewards/margins": 6.676229000091553,
327
+ "eval_rewards/rejected": -9.149091720581055,
328
+ "eval_runtime": 99.2676,
329
+ "eval_samples_per_second": 1.048,
330
+ "eval_steps_per_second": 0.524,
331
+ "step": 1400
332
+ },
333
+ {
334
+ "epoch": 2.3631350925561243,
335
+ "grad_norm": 0.03317335993051529,
336
+ "learning_rate": 6.572397118387572e-05,
337
+ "logits/chosen": -19.316547393798828,
338
+ "logits/rejected": -18.4414119720459,
339
+ "logps/chosen": -368.3520202636719,
340
+ "logps/rejected": -356.59405517578125,
341
+ "loss": 0.0094,
342
+ "rewards/accuracies": 0.9962499737739563,
343
+ "rewards/chosen": -1.4736888408660889,
344
+ "rewards/margins": 8.902289390563965,
345
+ "rewards/rejected": -10.375977516174316,
346
+ "step": 1500
347
+ },
348
+ {
349
+ "epoch": 2.520677432059866,
350
+ "grad_norm": 0.012674962170422077,
351
+ "learning_rate": 3.784680999053808e-05,
352
+ "logits/chosen": -19.28080177307129,
353
+ "logits/rejected": -18.418855667114258,
354
+ "logps/chosen": -384.20587158203125,
355
+ "logps/rejected": -371.5294494628906,
356
+ "loss": 0.0096,
357
+ "rewards/accuracies": 0.9950000047683716,
358
+ "rewards/chosen": -1.5619627237319946,
359
+ "rewards/margins": 9.07076644897461,
360
+ "rewards/rejected": -10.632728576660156,
361
+ "step": 1600
362
+ },
363
+ {
364
+ "epoch": 2.520677432059866,
365
+ "eval_logits/chosen": -15.806844711303711,
366
+ "eval_logits/rejected": -15.545206069946289,
367
+ "eval_logps/chosen": -422.5581359863281,
368
+ "eval_logps/rejected": -379.27069091796875,
369
+ "eval_loss": 0.010242895223200321,
370
+ "eval_rewards/accuracies": 1.0,
371
+ "eval_rewards/chosen": -2.716292142868042,
372
+ "eval_rewards/margins": 6.877115249633789,
373
+ "eval_rewards/rejected": -9.59340763092041,
374
+ "eval_runtime": 99.0694,
375
+ "eval_samples_per_second": 1.05,
376
+ "eval_steps_per_second": 0.525,
377
+ "step": 1600
378
+ },
379
+ {
380
+ "epoch": 2.6782197715636076,
381
+ "grad_norm": 0.02596099302172661,
382
+ "learning_rate": 1.7177350279888816e-05,
383
+ "logits/chosen": -19.332918167114258,
384
+ "logits/rejected": -18.499181747436523,
385
+ "logps/chosen": -373.5088195800781,
386
+ "logps/rejected": -373.32061767578125,
387
+ "loss": 0.0073,
388
+ "rewards/accuracies": 0.9975000023841858,
389
+ "rewards/chosen": -1.5841505527496338,
390
+ "rewards/margins": 9.309870719909668,
391
+ "rewards/rejected": -10.894021034240723,
392
+ "step": 1700
393
+ },
394
+ {
395
+ "epoch": 2.8357621110673494,
396
+ "grad_norm": 0.40319836139678955,
397
+ "learning_rate": 4.417817153497928e-06,
398
+ "logits/chosen": -19.361268997192383,
399
+ "logits/rejected": -18.549575805664062,
400
+ "logps/chosen": -375.323486328125,
401
+ "logps/rejected": -373.2767333984375,
402
+ "loss": 0.0085,
403
+ "rewards/accuracies": 0.9962499737739563,
404
+ "rewards/chosen": -1.6672673225402832,
405
+ "rewards/margins": 9.543354988098145,
406
+ "rewards/rejected": -11.210620880126953,
407
+ "step": 1800
408
+ },
409
+ {
410
+ "epoch": 2.8357621110673494,
411
+ "eval_logits/chosen": -15.784783363342285,
412
+ "eval_logits/rejected": -15.527274131774902,
413
+ "eval_logps/chosen": -423.8032531738281,
414
+ "eval_logps/rejected": -381.0099182128906,
415
+ "eval_loss": 0.009673921391367912,
416
+ "eval_rewards/accuracies": 1.0,
417
+ "eval_rewards/chosen": -2.8408048152923584,
418
+ "eval_rewards/margins": 6.926526069641113,
419
+ "eval_rewards/rejected": -9.767330169677734,
420
+ "eval_runtime": 99.5666,
421
+ "eval_samples_per_second": 1.045,
422
+ "eval_steps_per_second": 0.522,
423
+ "step": 1800
424
+ }
425
+ ],
426
+ "logging_steps": 100,
427
+ "max_steps": 1902,
428
+ "num_input_tokens_seen": 0,
429
+ "num_train_epochs": 3,
430
+ "save_steps": 200,
431
+ "total_flos": 0.0,
432
+ "train_batch_size": 2,
433
+ "trial_name": null,
434
+ "trial_params": null
435
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d2f791d62054ae3534ed7387cc41e6d31213a8067429b858cabcc5deb098ed1a
3
+ size 5560
vocab.json ADDED
The diff for this file is too large to render. See raw diff