alperenenes commited on
Commit
a7c1d7f
·
verified ·
1 Parent(s): 943e03a

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,202 @@
1
- ---
2
- license: unknown
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-VL-3B-Instruct
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
adapter_config.json ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "Qwen/Qwen2.5-VL-3B-Instruct",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 128,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 64,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "layers.31.mlp.up_proj",
28
+ "layers.2.mlp.gate_proj",
29
+ "layers.29.mlp.down_proj",
30
+ "layers.31.mlp.gate_proj",
31
+ "k_proj",
32
+ "layers.8.mlp.gate_proj",
33
+ "o_proj",
34
+ "layers.28.mlp.up_proj",
35
+ "32.mlp.up_proj",
36
+ "layers.0.mlp.down_proj",
37
+ "layers.19.mlp.gate_proj",
38
+ "layers.16.mlp.gate_proj",
39
+ "layers.24.mlp.up_proj",
40
+ "layers.14.mlp.gate_proj",
41
+ "layers.12.mlp.down_proj",
42
+ "layers.3.mlp.gate_proj",
43
+ "q_proj",
44
+ "layers.4.mlp.down_proj",
45
+ "layers.21.mlp.down_proj",
46
+ "layers.0.mlp.gate_proj",
47
+ "layers.7.mlp.up_proj",
48
+ "layers.27.mlp.up_proj",
49
+ "layers.29.mlp.gate_proj",
50
+ "layers.20.mlp.up_proj",
51
+ "layers.4.mlp.up_proj",
52
+ "layers.26.mlp.gate_proj",
53
+ "layers.9.mlp.down_proj",
54
+ "layers.1.mlp.down_proj",
55
+ "layers.5.mlp.up_proj",
56
+ "layers.16.mlp.up_proj",
57
+ "layers.19.mlp.up_proj",
58
+ "layers.23.mlp.up_proj",
59
+ "layers.29.mlp.up_proj",
60
+ "layers.16.mlp.down_proj",
61
+ "lm_head",
62
+ "layers.6.mlp.down_proj",
63
+ "33.mlp.up_proj",
64
+ "layers.12.mlp.gate_proj",
65
+ "layers.17.mlp.gate_proj",
66
+ "34.mlp.up_proj",
67
+ "layers.18.mlp.gate_proj",
68
+ "layers.20.mlp.down_proj",
69
+ "layers.28.mlp.gate_proj",
70
+ "layers.14.mlp.down_proj",
71
+ "layers.26.mlp.down_proj",
72
+ "layers.21.mlp.up_proj",
73
+ "layers.21.mlp.gate_proj",
74
+ "layers.1.mlp.up_proj",
75
+ "layers.30.mlp.up_proj",
76
+ "layers.27.mlp.gate_proj",
77
+ "layers.5.mlp.down_proj",
78
+ "layers.8.mlp.up_proj",
79
+ "layers.7.mlp.down_proj",
80
+ "layers.19.mlp.down_proj",
81
+ "33.mlp.gate_proj",
82
+ "layers.22.mlp.down_proj",
83
+ "layers.0.mlp.up_proj",
84
+ "35.mlp.gate_proj",
85
+ "layers.25.mlp.up_proj",
86
+ "layers.4.mlp.gate_proj",
87
+ "33.mlp.down_proj",
88
+ "layers.14.mlp.up_proj",
89
+ "layers.15.mlp.gate_proj",
90
+ "layers.8.mlp.down_proj",
91
+ "layers.31.mlp.down_proj",
92
+ "layers.22.mlp.gate_proj",
93
+ "layers.11.mlp.gate_proj",
94
+ "layers.24.mlp.down_proj",
95
+ "layers.11.mlp.up_proj",
96
+ "layers.2.mlp.up_proj",
97
+ "layers.24.mlp.gate_proj",
98
+ "layers.6.mlp.gate_proj",
99
+ "layers.7.mlp.gate_proj",
100
+ "layers.9.mlp.gate_proj",
101
+ "layers.10.mlp.down_proj",
102
+ "layers.20.mlp.gate_proj",
103
+ "layers.30.mlp.gate_proj",
104
+ "layers.13.mlp.gate_proj",
105
+ "layers.17.mlp.down_proj",
106
+ "32.mlp.down_proj",
107
+ "layers.18.mlp.up_proj",
108
+ "32.mlp.gate_proj",
109
+ "layers.3.mlp.down_proj",
110
+ "layers.11.mlp.down_proj",
111
+ "layers.17.mlp.up_proj",
112
+ "layers.10.mlp.up_proj",
113
+ "v_proj",
114
+ "35.mlp.down_proj",
115
+ "layers.1.mlp.gate_proj",
116
+ "layers.12.mlp.up_proj",
117
+ "layers.15.mlp.up_proj",
118
+ "34.mlp.gate_proj",
119
+ "layers.15.mlp.down_proj",
120
+ "layers.18.mlp.down_proj",
121
+ "34.mlp.down_proj",
122
+ "layers.23.mlp.down_proj",
123
+ "layers.27.mlp.down_proj",
124
+ "layers.10.mlp.gate_proj",
125
+ "layers.6.mlp.up_proj",
126
+ "layers.23.mlp.gate_proj",
127
+ "layers.13.mlp.down_proj",
128
+ "layers.26.mlp.up_proj",
129
+ "layers.9.mlp.up_proj",
130
+ "layers.25.mlp.gate_proj",
131
+ "layers.25.mlp.down_proj",
132
+ "layers.13.mlp.up_proj",
133
+ "layers.3.mlp.up_proj",
134
+ "layers.2.mlp.down_proj",
135
+ "35.mlp.up_proj",
136
+ "layers.28.mlp.down_proj",
137
+ "layers.30.mlp.down_proj",
138
+ "layers.5.mlp.gate_proj",
139
+ "layers.22.mlp.up_proj"
140
+ ],
141
+ "task_type": "CAUSAL_LM",
142
+ "trainable_token_indices": null,
143
+ "use_dora": false,
144
+ "use_rslora": false
145
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b13517ad060a2729d26c58754c186c5dc5596fd3e2d78f649e3a7a4783353099
3
+ size 881577392
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "chat_template": "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% endif %}<|im_start|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<|im_end|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}"
3
+ }
global_step200/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:481bb61df5e3b32db700046bc9492f8055e2dd43ef261aa13f013d82ea6ea463
3
+ size 777569296
global_step200/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:512c0b94301d6a472826d3044c56e2f476dcb14b4abfe4d7e5679aae3ef0ec8a
3
+ size 777564368
global_step200/mp_rank_00_model_states.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1360cdffd684fc9a4d58e724f92c38ef52a87b28b35400ad7128b5ed983ff31f
3
+ size 881924536
latest ADDED
@@ -0,0 +1 @@
 
 
1
+ global_step200
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
preprocessor_config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": true,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.48145466,
8
+ 0.4578275,
9
+ 0.40821073
10
+ ],
11
+ "image_processor_type": "Qwen2VLImageProcessor",
12
+ "image_std": [
13
+ 0.26862954,
14
+ 0.26130258,
15
+ 0.27577711
16
+ ],
17
+ "max_pixels": 12845056,
18
+ "merge_size": 2,
19
+ "min_pixels": 3136,
20
+ "patch_size": 14,
21
+ "processor_class": "Qwen2_5_VLProcessor",
22
+ "resample": 3,
23
+ "rescale_factor": 0.00392156862745098,
24
+ "size": {
25
+ "longest_edge": 12845056,
26
+ "shortest_edge": 3136
27
+ },
28
+ "temporal_patch_size": 2
29
+ }
rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b70e8ccdc10215494e0c425708b960e8fa5b44650404f00877b72b073d94b178
3
+ size 14512
rng_state_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0133f22fdd1fdcbce9e5106bc5b407a34b36866e3047076b1f5aee06230b5a9
3
+ size 14512
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5eee858c5123a4279c3e1f7b81247343f356ac767940b2692a928ad929543214
3
+ size 11422063
tokenizer_config.json ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% endif %}<|im_start|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<|im_end|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "extra_special_tokens": {},
203
+ "model_max_length": 131072,
204
+ "pad_token": "<|endoftext|>",
205
+ "processor_class": "Qwen2_5_VLProcessor",
206
+ "split_special_tokens": false,
207
+ "tokenizer_class": "Qwen2Tokenizer",
208
+ "unk_token": null
209
+ }
trainer_state.json ADDED
@@ -0,0 +1,2833 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.0024896678783050343,
5
+ "eval_steps": 500,
6
+ "global_step": 200,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "clip_ratio": 0.0,
13
+ "completion_length": 98.46875,
14
+ "epoch": 1.244833939152517e-05,
15
+ "grad_norm": 0.7918580174446106,
16
+ "kl": 0.0,
17
+ "learning_rate": 9.999937758303043e-06,
18
+ "loss": 0.0,
19
+ "reward": 0.06212758191395551,
20
+ "reward_std": 0.17572333943098783,
21
+ "rewards/format_reward_rec": 0.03125,
22
+ "rewards/iou_reward": 0.03087758377660066,
23
+ "step": 1
24
+ },
25
+ {
26
+ "clip_ratio": 0.0,
27
+ "completion_length": 96.3125,
28
+ "epoch": 2.489667878305034e-05,
29
+ "grad_norm": 0.4728325307369232,
30
+ "kl": 0.001476287841796875,
31
+ "learning_rate": 9.999875516606086e-06,
32
+ "loss": 0.0001,
33
+ "reward": 0.05815719813108444,
34
+ "reward_std": 0.1644933968782425,
35
+ "rewards/format_reward_rec": 0.03125,
36
+ "rewards/iou_reward": 0.02690719999372959,
37
+ "step": 2
38
+ },
39
+ {
40
+ "clip_ratio": 0.0,
41
+ "completion_length": 104.34375,
42
+ "epoch": 3.7345018174575515e-05,
43
+ "grad_norm": 0.6685078740119934,
44
+ "kl": 0.0016937255859375,
45
+ "learning_rate": 9.999813274909128e-06,
46
+ "loss": 0.0001,
47
+ "reward": 0.2839890792965889,
48
+ "reward_std": 0.4464326649904251,
49
+ "rewards/format_reward_rec": 0.15625,
50
+ "rewards/iou_reward": 0.1277390792965889,
51
+ "step": 3
52
+ },
53
+ {
54
+ "clip_ratio": 0.0,
55
+ "completion_length": 92.71875,
56
+ "epoch": 4.979335756610068e-05,
57
+ "grad_norm": 0.7356534004211426,
58
+ "kl": 0.00652313232421875,
59
+ "learning_rate": 9.99975103321217e-06,
60
+ "loss": 0.0003,
61
+ "reward": 0.43574684858322144,
62
+ "reward_std": 0.5346276015043259,
63
+ "rewards/format_reward_rec": 0.28125,
64
+ "rewards/iou_reward": 0.15449684113264084,
65
+ "step": 4
66
+ },
67
+ {
68
+ "clip_ratio": 0.0,
69
+ "completion_length": 116.8125,
70
+ "epoch": 6.224169695762585e-05,
71
+ "grad_norm": 1.5218911170959473,
72
+ "kl": 0.01190185546875,
73
+ "learning_rate": 9.999688791515214e-06,
74
+ "loss": 0.0005,
75
+ "reward": 0.5201369971036911,
76
+ "reward_std": 0.7230685949325562,
77
+ "rewards/format_reward_rec": 0.34375,
78
+ "rewards/iou_reward": 0.1763869971036911,
79
+ "step": 5
80
+ },
81
+ {
82
+ "clip_ratio": 0.0,
83
+ "completion_length": 100.28125,
84
+ "epoch": 7.469003634915103e-05,
85
+ "grad_norm": 1.0370837450027466,
86
+ "kl": 0.03338623046875,
87
+ "learning_rate": 9.999626549818255e-06,
88
+ "loss": 0.0013,
89
+ "reward": 0.45021720230579376,
90
+ "reward_std": 0.6490348875522614,
91
+ "rewards/format_reward_rec": 0.21875,
92
+ "rewards/iou_reward": 0.23146723210811615,
93
+ "step": 6
94
+ },
95
+ {
96
+ "clip_ratio": 0.0,
97
+ "completion_length": 109.84375,
98
+ "epoch": 8.71383757406762e-05,
99
+ "grad_norm": 0.8345764875411987,
100
+ "kl": 0.03363037109375,
101
+ "learning_rate": 9.999564308121297e-06,
102
+ "loss": 0.0013,
103
+ "reward": 0.4857834577560425,
104
+ "reward_std": 0.7779734134674072,
105
+ "rewards/format_reward_rec": 0.3125,
106
+ "rewards/iou_reward": 0.17328345030546188,
107
+ "step": 7
108
+ },
109
+ {
110
+ "clip_ratio": 0.0,
111
+ "completion_length": 101.71875,
112
+ "epoch": 9.958671513220136e-05,
113
+ "grad_norm": 0.857598066329956,
114
+ "kl": 0.03662109375,
115
+ "learning_rate": 9.999502066424341e-06,
116
+ "loss": 0.0015,
117
+ "reward": 0.9932036697864532,
118
+ "reward_std": 0.8116889894008636,
119
+ "rewards/format_reward_rec": 0.5625,
120
+ "rewards/iou_reward": 0.43070371448993683,
121
+ "step": 8
122
+ },
123
+ {
124
+ "clip_ratio": 0.0,
125
+ "completion_length": 118.28125,
126
+ "epoch": 0.00011203505452372654,
127
+ "grad_norm": 0.6808584928512573,
128
+ "kl": 0.0491943359375,
129
+ "learning_rate": 9.999439824727383e-06,
130
+ "loss": 0.002,
131
+ "reward": 1.22513547539711,
132
+ "reward_std": 0.6850164234638214,
133
+ "rewards/format_reward_rec": 0.71875,
134
+ "rewards/iou_reward": 0.5063855350017548,
135
+ "step": 9
136
+ },
137
+ {
138
+ "clip_ratio": 0.0,
139
+ "completion_length": 130.65625,
140
+ "epoch": 0.0001244833939152517,
141
+ "grad_norm": 0.7755190134048462,
142
+ "kl": 0.047607421875,
143
+ "learning_rate": 9.999377583030425e-06,
144
+ "loss": 0.0019,
145
+ "reward": 1.080191969871521,
146
+ "reward_std": 0.8103933334350586,
147
+ "rewards/format_reward_rec": 0.6875,
148
+ "rewards/iou_reward": 0.3926919847726822,
149
+ "step": 10
150
+ },
151
+ {
152
+ "clip_ratio": 0.0,
153
+ "completion_length": 119.65625,
154
+ "epoch": 0.00013693173330677687,
155
+ "grad_norm": 0.8545856475830078,
156
+ "kl": 0.055908203125,
157
+ "learning_rate": 9.999315341333466e-06,
158
+ "loss": 0.0022,
159
+ "reward": 1.0447435975074768,
160
+ "reward_std": 0.7395115196704865,
161
+ "rewards/format_reward_rec": 0.71875,
162
+ "rewards/iou_reward": 0.3259935975074768,
163
+ "step": 11
164
+ },
165
+ {
166
+ "clip_ratio": 0.0,
167
+ "completion_length": 138.25,
168
+ "epoch": 0.00014938007269830206,
169
+ "grad_norm": 0.5796331167221069,
170
+ "kl": 0.059326171875,
171
+ "learning_rate": 9.999253099636508e-06,
172
+ "loss": 0.0024,
173
+ "reward": 1.3631539940834045,
174
+ "reward_std": 0.5613991022109985,
175
+ "rewards/format_reward_rec": 0.875,
176
+ "rewards/iou_reward": 0.48815396428108215,
177
+ "step": 12
178
+ },
179
+ {
180
+ "clip_ratio": 0.0,
181
+ "completion_length": 139.9375,
182
+ "epoch": 0.00016182841208982722,
183
+ "grad_norm": 0.7014759182929993,
184
+ "kl": 0.1033935546875,
185
+ "learning_rate": 9.999190857939552e-06,
186
+ "loss": 0.0041,
187
+ "reward": 1.2715444564819336,
188
+ "reward_std": 0.6332900673151016,
189
+ "rewards/format_reward_rec": 0.75,
190
+ "rewards/iou_reward": 0.5215444564819336,
191
+ "step": 13
192
+ },
193
+ {
194
+ "clip_ratio": 0.0,
195
+ "completion_length": 109.25,
196
+ "epoch": 0.0001742767514813524,
197
+ "grad_norm": 0.6986764073371887,
198
+ "kl": 0.08349609375,
199
+ "learning_rate": 9.999128616242594e-06,
200
+ "loss": 0.0033,
201
+ "reward": 1.4144143462181091,
202
+ "reward_std": 0.5491724014282227,
203
+ "rewards/format_reward_rec": 0.84375,
204
+ "rewards/iou_reward": 0.5706643462181091,
205
+ "step": 14
206
+ },
207
+ {
208
+ "clip_ratio": 0.0,
209
+ "completion_length": 138.71875,
210
+ "epoch": 0.00018672509087287755,
211
+ "grad_norm": 718.58984375,
212
+ "kl": 53.529296875,
213
+ "learning_rate": 9.999066374545636e-06,
214
+ "loss": 2.145,
215
+ "reward": 1.1624789834022522,
216
+ "reward_std": 0.586757242679596,
217
+ "rewards/format_reward_rec": 0.78125,
218
+ "rewards/iou_reward": 0.381228968501091,
219
+ "step": 15
220
+ },
221
+ {
222
+ "clip_ratio": 0.0,
223
+ "completion_length": 106.46875,
224
+ "epoch": 0.00019917343026440272,
225
+ "grad_norm": 0.6615357995033264,
226
+ "kl": 0.064208984375,
227
+ "learning_rate": 9.99900413284868e-06,
228
+ "loss": 0.0026,
229
+ "reward": 1.4353669881820679,
230
+ "reward_std": 0.5828511416912079,
231
+ "rewards/format_reward_rec": 0.84375,
232
+ "rewards/iou_reward": 0.5916168987751007,
233
+ "step": 16
234
+ },
235
+ {
236
+ "clip_ratio": 0.0,
237
+ "completion_length": 122.78125,
238
+ "epoch": 0.0002116217696559279,
239
+ "grad_norm": 0.719054102897644,
240
+ "kl": 0.081787109375,
241
+ "learning_rate": 9.998941891151721e-06,
242
+ "loss": 0.0033,
243
+ "reward": 1.6010385155677795,
244
+ "reward_std": 0.29098983108997345,
245
+ "rewards/format_reward_rec": 0.9375,
246
+ "rewards/iou_reward": 0.6635385453701019,
247
+ "step": 17
248
+ },
249
+ {
250
+ "clip_ratio": 0.0,
251
+ "completion_length": 109.125,
252
+ "epoch": 0.00022407010904745308,
253
+ "grad_norm": 0.6034934520721436,
254
+ "kl": 0.075927734375,
255
+ "learning_rate": 9.998879649454763e-06,
256
+ "loss": 0.003,
257
+ "reward": 1.747658133506775,
258
+ "reward_std": 0.19360525161027908,
259
+ "rewards/format_reward_rec": 1.0,
260
+ "rewards/iou_reward": 0.7476581633090973,
261
+ "step": 18
262
+ },
263
+ {
264
+ "clip_ratio": 0.0,
265
+ "completion_length": 94.21875,
266
+ "epoch": 0.00023651844843897824,
267
+ "grad_norm": 0.7481162548065186,
268
+ "kl": 0.1044921875,
269
+ "learning_rate": 9.998817407757807e-06,
270
+ "loss": 0.0042,
271
+ "reward": 1.5863367319107056,
272
+ "reward_std": 0.4242909550666809,
273
+ "rewards/format_reward_rec": 0.9375,
274
+ "rewards/iou_reward": 0.6488366723060608,
275
+ "step": 19
276
+ },
277
+ {
278
+ "clip_ratio": 0.0,
279
+ "completion_length": 104.75,
280
+ "epoch": 0.0002489667878305034,
281
+ "grad_norm": 0.643527626991272,
282
+ "kl": 0.07177734375,
283
+ "learning_rate": 9.998755166060848e-06,
284
+ "loss": 0.0029,
285
+ "reward": 1.3996891975402832,
286
+ "reward_std": 0.37390778958797455,
287
+ "rewards/format_reward_rec": 0.9375,
288
+ "rewards/iou_reward": 0.4621891975402832,
289
+ "step": 20
290
+ },
291
+ {
292
+ "clip_ratio": 0.0,
293
+ "completion_length": 107.625,
294
+ "epoch": 0.00026141512722202857,
295
+ "grad_norm": 0.7482960224151611,
296
+ "kl": 0.098388671875,
297
+ "learning_rate": 9.99869292436389e-06,
298
+ "loss": 0.0039,
299
+ "reward": 1.5653824210166931,
300
+ "reward_std": 0.34851640462875366,
301
+ "rewards/format_reward_rec": 0.9375,
302
+ "rewards/iou_reward": 0.6278825104236603,
303
+ "step": 21
304
+ },
305
+ {
306
+ "clip_ratio": 0.0,
307
+ "completion_length": 102.1875,
308
+ "epoch": 0.00027386346661355373,
309
+ "grad_norm": 0.6192396283149719,
310
+ "kl": 0.074462890625,
311
+ "learning_rate": 9.998630682666934e-06,
312
+ "loss": 0.003,
313
+ "reward": 1.8187173008918762,
314
+ "reward_std": 0.20407778769731522,
315
+ "rewards/format_reward_rec": 1.0,
316
+ "rewards/iou_reward": 0.8187173306941986,
317
+ "step": 22
318
+ },
319
+ {
320
+ "clip_ratio": 0.0,
321
+ "completion_length": 119.6875,
322
+ "epoch": 0.0002863118060050789,
323
+ "grad_norm": 0.6803566217422485,
324
+ "kl": 0.0623779296875,
325
+ "learning_rate": 9.998568440969976e-06,
326
+ "loss": 0.0025,
327
+ "reward": 1.60598224401474,
328
+ "reward_std": 0.34990501403808594,
329
+ "rewards/format_reward_rec": 0.96875,
330
+ "rewards/iou_reward": 0.63723224401474,
331
+ "step": 23
332
+ },
333
+ {
334
+ "clip_ratio": 0.0,
335
+ "completion_length": 95.78125,
336
+ "epoch": 0.0002987601453966041,
337
+ "grad_norm": 0.6281342506408691,
338
+ "kl": 0.092529296875,
339
+ "learning_rate": 9.998506199273018e-06,
340
+ "loss": 0.0037,
341
+ "reward": 1.672945261001587,
342
+ "reward_std": 0.4158570021390915,
343
+ "rewards/format_reward_rec": 0.9375,
344
+ "rewards/iou_reward": 0.7354452311992645,
345
+ "step": 24
346
+ },
347
+ {
348
+ "clip_ratio": 0.0,
349
+ "completion_length": 80.875,
350
+ "epoch": 0.0003112084847881293,
351
+ "grad_norm": 0.7759396433830261,
352
+ "kl": 0.1015625,
353
+ "learning_rate": 9.998443957576061e-06,
354
+ "loss": 0.0041,
355
+ "reward": 1.627230942249298,
356
+ "reward_std": 0.34825482964515686,
357
+ "rewards/format_reward_rec": 0.96875,
358
+ "rewards/iou_reward": 0.6584809124469757,
359
+ "step": 25
360
+ },
361
+ {
362
+ "clip_ratio": 0.0,
363
+ "completion_length": 97.6875,
364
+ "epoch": 0.00032365682417965445,
365
+ "grad_norm": 0.6152196526527405,
366
+ "kl": 0.103515625,
367
+ "learning_rate": 9.998381715879103e-06,
368
+ "loss": 0.0041,
369
+ "reward": 1.6838378310203552,
370
+ "reward_std": 0.11833954602479935,
371
+ "rewards/format_reward_rec": 1.0,
372
+ "rewards/iou_reward": 0.6838378608226776,
373
+ "step": 26
374
+ },
375
+ {
376
+ "clip_ratio": 0.0,
377
+ "completion_length": 80.78125,
378
+ "epoch": 0.0003361051635711796,
379
+ "grad_norm": 0.5609109997749329,
380
+ "kl": 0.1142578125,
381
+ "learning_rate": 9.998319474182145e-06,
382
+ "loss": 0.0046,
383
+ "reward": 1.8090254664421082,
384
+ "reward_std": 0.062307149171829224,
385
+ "rewards/format_reward_rec": 1.0,
386
+ "rewards/iou_reward": 0.8090254366397858,
387
+ "step": 27
388
+ },
389
+ {
390
+ "clip_ratio": 0.0,
391
+ "completion_length": 76.71875,
392
+ "epoch": 0.0003485535029627048,
393
+ "grad_norm": 0.6533063054084778,
394
+ "kl": 0.13818359375,
395
+ "learning_rate": 9.998257232485187e-06,
396
+ "loss": 0.0055,
397
+ "reward": 1.6686185598373413,
398
+ "reward_std": 0.1674296036362648,
399
+ "rewards/format_reward_rec": 1.0,
400
+ "rewards/iou_reward": 0.6686184704303741,
401
+ "step": 28
402
+ },
403
+ {
404
+ "clip_ratio": 0.0,
405
+ "completion_length": 85.15625,
406
+ "epoch": 0.00036100184235422994,
407
+ "grad_norm": 0.5266671180725098,
408
+ "kl": 0.09375,
409
+ "learning_rate": 9.998194990788229e-06,
410
+ "loss": 0.0037,
411
+ "reward": 1.7991711497306824,
412
+ "reward_std": 0.2697625942528248,
413
+ "rewards/format_reward_rec": 0.96875,
414
+ "rewards/iou_reward": 0.8304212093353271,
415
+ "step": 29
416
+ },
417
+ {
418
+ "clip_ratio": 0.0,
419
+ "completion_length": 74.15625,
420
+ "epoch": 0.0003734501817457551,
421
+ "grad_norm": 0.680327296257019,
422
+ "kl": 0.12109375,
423
+ "learning_rate": 9.998132749091272e-06,
424
+ "loss": 0.0048,
425
+ "reward": 1.719760000705719,
426
+ "reward_std": 0.18976478278636932,
427
+ "rewards/format_reward_rec": 0.96875,
428
+ "rewards/iou_reward": 0.7510099709033966,
429
+ "step": 30
430
+ },
431
+ {
432
+ "clip_ratio": 0.0,
433
+ "completion_length": 89.96875,
434
+ "epoch": 0.00038589852113728027,
435
+ "grad_norm": 0.5414130687713623,
436
+ "kl": 0.090087890625,
437
+ "learning_rate": 9.998070507394314e-06,
438
+ "loss": 0.0036,
439
+ "reward": 1.6798821687698364,
440
+ "reward_std": 0.21795213222503662,
441
+ "rewards/format_reward_rec": 0.96875,
442
+ "rewards/iou_reward": 0.7111322581768036,
443
+ "step": 31
444
+ },
445
+ {
446
+ "clip_ratio": 0.0,
447
+ "completion_length": 84.9375,
448
+ "epoch": 0.00039834686052880544,
449
+ "grad_norm": 0.46933692693710327,
450
+ "kl": 0.1025390625,
451
+ "learning_rate": 9.998008265697356e-06,
452
+ "loss": 0.0041,
453
+ "reward": 1.7964898943901062,
454
+ "reward_std": 0.22818857431411743,
455
+ "rewards/format_reward_rec": 0.96875,
456
+ "rewards/iou_reward": 0.827739804983139,
457
+ "step": 32
458
+ },
459
+ {
460
+ "clip_ratio": 0.0,
461
+ "completion_length": 79.21875,
462
+ "epoch": 0.0004107951999203306,
463
+ "grad_norm": 0.6017041802406311,
464
+ "kl": 0.13720703125,
465
+ "learning_rate": 9.9979460240004e-06,
466
+ "loss": 0.0055,
467
+ "reward": 1.6301453113555908,
468
+ "reward_std": 0.27136652171611786,
469
+ "rewards/format_reward_rec": 1.0,
470
+ "rewards/iou_reward": 0.6301453560590744,
471
+ "step": 33
472
+ },
473
+ {
474
+ "clip_ratio": 0.0,
475
+ "completion_length": 82.5,
476
+ "epoch": 0.0004232435393118558,
477
+ "grad_norm": 0.5675345659255981,
478
+ "kl": 0.125732421875,
479
+ "learning_rate": 9.997883782303441e-06,
480
+ "loss": 0.005,
481
+ "reward": 1.5771546363830566,
482
+ "reward_std": 0.1715424843132496,
483
+ "rewards/format_reward_rec": 1.0,
484
+ "rewards/iou_reward": 0.5771546363830566,
485
+ "step": 34
486
+ },
487
+ {
488
+ "clip_ratio": 0.0,
489
+ "completion_length": 95.15625,
490
+ "epoch": 0.000435691878703381,
491
+ "grad_norm": 0.6039296388626099,
492
+ "kl": 0.11669921875,
493
+ "learning_rate": 9.997821540606483e-06,
494
+ "loss": 0.0047,
495
+ "reward": 1.747186541557312,
496
+ "reward_std": 0.30833982676267624,
497
+ "rewards/format_reward_rec": 0.96875,
498
+ "rewards/iou_reward": 0.7784366011619568,
499
+ "step": 35
500
+ },
501
+ {
502
+ "clip_ratio": 0.0,
503
+ "completion_length": 79.28125,
504
+ "epoch": 0.00044814021809490615,
505
+ "grad_norm": 1.9511338472366333,
506
+ "kl": 0.32470703125,
507
+ "learning_rate": 9.997759298909527e-06,
508
+ "loss": 0.0129,
509
+ "reward": 1.6023600101470947,
510
+ "reward_std": 0.3212697505950928,
511
+ "rewards/format_reward_rec": 0.9375,
512
+ "rewards/iou_reward": 0.6648600399494171,
513
+ "step": 36
514
+ },
515
+ {
516
+ "clip_ratio": 0.0,
517
+ "completion_length": 91.96875,
518
+ "epoch": 0.0004605885574864313,
519
+ "grad_norm": 0.6094495058059692,
520
+ "kl": 0.1591796875,
521
+ "learning_rate": 9.997697057212569e-06,
522
+ "loss": 0.0064,
523
+ "reward": 1.5261580348014832,
524
+ "reward_std": 0.3047035411000252,
525
+ "rewards/format_reward_rec": 0.96875,
526
+ "rewards/iou_reward": 0.557407945394516,
527
+ "step": 37
528
+ },
529
+ {
530
+ "clip_ratio": 0.0,
531
+ "completion_length": 84.28125,
532
+ "epoch": 0.0004730368968779565,
533
+ "grad_norm": 0.5133370161056519,
534
+ "kl": 0.10693359375,
535
+ "learning_rate": 9.99763481551561e-06,
536
+ "loss": 0.0043,
537
+ "reward": 1.5371721386909485,
538
+ "reward_std": 0.2661096602678299,
539
+ "rewards/format_reward_rec": 0.9375,
540
+ "rewards/iou_reward": 0.5996721386909485,
541
+ "step": 38
542
+ },
543
+ {
544
+ "clip_ratio": 0.0,
545
+ "completion_length": 95.9375,
546
+ "epoch": 0.00048548523626948165,
547
+ "grad_norm": 0.47525426745414734,
548
+ "kl": 0.12255859375,
549
+ "learning_rate": 9.997572573818654e-06,
550
+ "loss": 0.0049,
551
+ "reward": 1.6892600059509277,
552
+ "reward_std": 0.1822982355952263,
553
+ "rewards/format_reward_rec": 1.0,
554
+ "rewards/iou_reward": 0.6892601251602173,
555
+ "step": 39
556
+ },
557
+ {
558
+ "clip_ratio": 0.0,
559
+ "completion_length": 90.0,
560
+ "epoch": 0.0004979335756610068,
561
+ "grad_norm": 0.5481269955635071,
562
+ "kl": 0.0859375,
563
+ "learning_rate": 9.997510332121696e-06,
564
+ "loss": 0.0034,
565
+ "reward": 1.669859528541565,
566
+ "reward_std": 0.16043629869818687,
567
+ "rewards/format_reward_rec": 1.0,
568
+ "rewards/iou_reward": 0.6698595285415649,
569
+ "step": 40
570
+ },
571
+ {
572
+ "clip_ratio": 0.0,
573
+ "completion_length": 87.34375,
574
+ "epoch": 0.000510381915052532,
575
+ "grad_norm": 0.574486494064331,
576
+ "kl": 0.110107421875,
577
+ "learning_rate": 9.997448090424738e-06,
578
+ "loss": 0.0044,
579
+ "reward": 1.6145691275596619,
580
+ "reward_std": 0.16043317317962646,
581
+ "rewards/format_reward_rec": 0.96875,
582
+ "rewards/iou_reward": 0.6458190977573395,
583
+ "step": 41
584
+ },
585
+ {
586
+ "clip_ratio": 0.0,
587
+ "completion_length": 105.21875,
588
+ "epoch": 0.0005228302544440571,
589
+ "grad_norm": 0.5472383499145508,
590
+ "kl": 0.091552734375,
591
+ "learning_rate": 9.997385848727781e-06,
592
+ "loss": 0.0037,
593
+ "reward": 1.6457700729370117,
594
+ "reward_std": 0.3335290104150772,
595
+ "rewards/format_reward_rec": 0.9375,
596
+ "rewards/iou_reward": 0.7082701325416565,
597
+ "step": 42
598
+ },
599
+ {
600
+ "clip_ratio": 0.0,
601
+ "completion_length": 83.125,
602
+ "epoch": 0.0005352785938355823,
603
+ "grad_norm": 0.5541640520095825,
604
+ "kl": 0.14306640625,
605
+ "learning_rate": 9.997323607030823e-06,
606
+ "loss": 0.0057,
607
+ "reward": 1.6855499744415283,
608
+ "reward_std": 0.2959246411919594,
609
+ "rewards/format_reward_rec": 0.96875,
610
+ "rewards/iou_reward": 0.7167999744415283,
611
+ "step": 43
612
+ },
613
+ {
614
+ "clip_ratio": 0.0,
615
+ "completion_length": 81.5625,
616
+ "epoch": 0.0005477269332271075,
617
+ "grad_norm": 0.5674740672111511,
618
+ "kl": 0.13134765625,
619
+ "learning_rate": 9.997261365333865e-06,
620
+ "loss": 0.0053,
621
+ "reward": 1.514305830001831,
622
+ "reward_std": 0.22089600563049316,
623
+ "rewards/format_reward_rec": 1.0,
624
+ "rewards/iou_reward": 0.5143058151006699,
625
+ "step": 44
626
+ },
627
+ {
628
+ "clip_ratio": 0.0,
629
+ "completion_length": 91.53125,
630
+ "epoch": 0.0005601752726186326,
631
+ "grad_norm": 0.5152494311332703,
632
+ "kl": 0.089599609375,
633
+ "learning_rate": 9.997199123636909e-06,
634
+ "loss": 0.0036,
635
+ "reward": 1.794326663017273,
636
+ "reward_std": 0.043453872203826904,
637
+ "rewards/format_reward_rec": 1.0,
638
+ "rewards/iou_reward": 0.7943267226219177,
639
+ "step": 45
640
+ },
641
+ {
642
+ "clip_ratio": 0.0,
643
+ "completion_length": 87.96875,
644
+ "epoch": 0.0005726236120101578,
645
+ "grad_norm": 0.5079981088638306,
646
+ "kl": 0.125,
647
+ "learning_rate": 9.99713688193995e-06,
648
+ "loss": 0.005,
649
+ "reward": 1.7662574648857117,
650
+ "reward_std": 0.17677413672208786,
651
+ "rewards/format_reward_rec": 1.0,
652
+ "rewards/iou_reward": 0.7662573754787445,
653
+ "step": 46
654
+ },
655
+ {
656
+ "clip_ratio": 0.0,
657
+ "completion_length": 84.0,
658
+ "epoch": 0.000585071951401683,
659
+ "grad_norm": 0.5033786296844482,
660
+ "kl": 0.1220703125,
661
+ "learning_rate": 9.997074640242992e-06,
662
+ "loss": 0.0049,
663
+ "reward": 1.4638007879257202,
664
+ "reward_std": 0.2003663182258606,
665
+ "rewards/format_reward_rec": 1.0,
666
+ "rewards/iou_reward": 0.4638008028268814,
667
+ "step": 47
668
+ },
669
+ {
670
+ "clip_ratio": 0.0,
671
+ "completion_length": 80.875,
672
+ "epoch": 0.0005975202907932082,
673
+ "grad_norm": 0.5780296325683594,
674
+ "kl": 0.12060546875,
675
+ "learning_rate": 9.997012398546034e-06,
676
+ "loss": 0.0048,
677
+ "reward": 1.7631506323814392,
678
+ "reward_std": 0.2995036095380783,
679
+ "rewards/format_reward_rec": 0.96875,
680
+ "rewards/iou_reward": 0.7944006323814392,
681
+ "step": 48
682
+ },
683
+ {
684
+ "clip_ratio": 0.0,
685
+ "completion_length": 89.125,
686
+ "epoch": 0.0006099686301847334,
687
+ "grad_norm": 0.523960530757904,
688
+ "kl": 0.13037109375,
689
+ "learning_rate": 9.996950156849076e-06,
690
+ "loss": 0.0052,
691
+ "reward": 1.5915122628211975,
692
+ "reward_std": 0.292996384203434,
693
+ "rewards/format_reward_rec": 1.0,
694
+ "rewards/iou_reward": 0.5915123075246811,
695
+ "step": 49
696
+ },
697
+ {
698
+ "clip_ratio": 0.0,
699
+ "completion_length": 84.875,
700
+ "epoch": 0.0006224169695762586,
701
+ "grad_norm": 0.5157907605171204,
702
+ "kl": 0.114013671875,
703
+ "learning_rate": 9.99688791515212e-06,
704
+ "loss": 0.0046,
705
+ "reward": 1.8732710480690002,
706
+ "reward_std": 0.039826540276408195,
707
+ "rewards/format_reward_rec": 1.0,
708
+ "rewards/iou_reward": 0.8732710778713226,
709
+ "step": 50
710
+ },
711
+ {
712
+ "clip_ratio": 0.0,
713
+ "completion_length": 88.9375,
714
+ "epoch": 0.0006348653089677837,
715
+ "grad_norm": 0.49286431074142456,
716
+ "kl": 0.13818359375,
717
+ "learning_rate": 9.996825673455162e-06,
718
+ "loss": 0.0055,
719
+ "reward": 1.7973515391349792,
720
+ "reward_std": 0.11208843067288399,
721
+ "rewards/format_reward_rec": 1.0,
722
+ "rewards/iou_reward": 0.7973516285419464,
723
+ "step": 51
724
+ },
725
+ {
726
+ "clip_ratio": 0.0,
727
+ "completion_length": 79.53125,
728
+ "epoch": 0.0006473136483593089,
729
+ "grad_norm": 0.5944603085517883,
730
+ "kl": 0.13671875,
731
+ "learning_rate": 9.996763431758203e-06,
732
+ "loss": 0.0054,
733
+ "reward": 1.6538516283035278,
734
+ "reward_std": 0.3369443193078041,
735
+ "rewards/format_reward_rec": 0.9375,
736
+ "rewards/iou_reward": 0.7163516879081726,
737
+ "step": 52
738
+ },
739
+ {
740
+ "clip_ratio": 0.0,
741
+ "completion_length": 86.0,
742
+ "epoch": 0.0006597619877508341,
743
+ "grad_norm": 0.5229175090789795,
744
+ "kl": 0.113525390625,
745
+ "learning_rate": 9.996701190061247e-06,
746
+ "loss": 0.0045,
747
+ "reward": 1.688271403312683,
748
+ "reward_std": 0.16166172549128532,
749
+ "rewards/format_reward_rec": 1.0,
750
+ "rewards/iou_reward": 0.6882712692022324,
751
+ "step": 53
752
+ },
753
+ {
754
+ "clip_ratio": 0.0,
755
+ "completion_length": 76.84375,
756
+ "epoch": 0.0006722103271423592,
757
+ "grad_norm": 0.5642473697662354,
758
+ "kl": 0.13916015625,
759
+ "learning_rate": 9.996638948364289e-06,
760
+ "loss": 0.0056,
761
+ "reward": 1.757770597934723,
762
+ "reward_std": 0.1410532221198082,
763
+ "rewards/format_reward_rec": 1.0,
764
+ "rewards/iou_reward": 0.7577706277370453,
765
+ "step": 54
766
+ },
767
+ {
768
+ "clip_ratio": 0.0,
769
+ "completion_length": 79.96875,
770
+ "epoch": 0.0006846586665338844,
771
+ "grad_norm": 0.6692479848861694,
772
+ "kl": 0.1494140625,
773
+ "learning_rate": 9.99657670666733e-06,
774
+ "loss": 0.006,
775
+ "reward": 1.5787354111671448,
776
+ "reward_std": 0.18042169511318207,
777
+ "rewards/format_reward_rec": 1.0,
778
+ "rewards/iou_reward": 0.5787354409694672,
779
+ "step": 55
780
+ },
781
+ {
782
+ "clip_ratio": 0.0,
783
+ "completion_length": 88.4375,
784
+ "epoch": 0.0006971070059254096,
785
+ "grad_norm": 0.6885099411010742,
786
+ "kl": 0.11669921875,
787
+ "learning_rate": 9.996514464970374e-06,
788
+ "loss": 0.0047,
789
+ "reward": 1.361689031124115,
790
+ "reward_std": 0.36215692572295666,
791
+ "rewards/format_reward_rec": 0.9375,
792
+ "rewards/iou_reward": 0.4241890572011471,
793
+ "step": 56
794
+ },
795
+ {
796
+ "clip_ratio": 0.0,
797
+ "completion_length": 89.75,
798
+ "epoch": 0.0007095553453169347,
799
+ "grad_norm": 0.474343478679657,
800
+ "kl": 0.1025390625,
801
+ "learning_rate": 9.996452223273416e-06,
802
+ "loss": 0.0041,
803
+ "reward": 1.723258912563324,
804
+ "reward_std": 0.2072291597723961,
805
+ "rewards/format_reward_rec": 1.0,
806
+ "rewards/iou_reward": 0.723258912563324,
807
+ "step": 57
808
+ },
809
+ {
810
+ "clip_ratio": 0.0,
811
+ "completion_length": 81.59375,
812
+ "epoch": 0.0007220036847084599,
813
+ "grad_norm": 0.5668300986289978,
814
+ "kl": 0.12841796875,
815
+ "learning_rate": 9.996389981576458e-06,
816
+ "loss": 0.0052,
817
+ "reward": 1.6465070843696594,
818
+ "reward_std": 0.09229656681418419,
819
+ "rewards/format_reward_rec": 1.0,
820
+ "rewards/iou_reward": 0.6465071439743042,
821
+ "step": 58
822
+ },
823
+ {
824
+ "clip_ratio": 0.0,
825
+ "completion_length": 86.59375,
826
+ "epoch": 0.000734452024099985,
827
+ "grad_norm": 0.8369623422622681,
828
+ "kl": 0.140625,
829
+ "learning_rate": 9.996327739879502e-06,
830
+ "loss": 0.0056,
831
+ "reward": 1.6136068105697632,
832
+ "reward_std": 0.3137252777814865,
833
+ "rewards/format_reward_rec": 0.9375,
834
+ "rewards/iou_reward": 0.6761067807674408,
835
+ "step": 59
836
+ },
837
+ {
838
+ "clip_ratio": 0.0,
839
+ "completion_length": 78.0625,
840
+ "epoch": 0.0007469003634915102,
841
+ "grad_norm": 0.5556425452232361,
842
+ "kl": 0.15625,
843
+ "learning_rate": 9.996265498182544e-06,
844
+ "loss": 0.0063,
845
+ "reward": 1.7592704892158508,
846
+ "reward_std": 0.14777341671288013,
847
+ "rewards/format_reward_rec": 1.0,
848
+ "rewards/iou_reward": 0.759270429611206,
849
+ "step": 60
850
+ },
851
+ {
852
+ "clip_ratio": 0.0,
853
+ "completion_length": 85.71875,
854
+ "epoch": 0.0007593487028830354,
855
+ "grad_norm": 0.5661209225654602,
856
+ "kl": 0.12939453125,
857
+ "learning_rate": 9.996203256485585e-06,
858
+ "loss": 0.0052,
859
+ "reward": 1.732073187828064,
860
+ "reward_std": 0.2148379534482956,
861
+ "rewards/format_reward_rec": 1.0,
862
+ "rewards/iou_reward": 0.7320732176303864,
863
+ "step": 61
864
+ },
865
+ {
866
+ "clip_ratio": 0.0,
867
+ "completion_length": 78.71875,
868
+ "epoch": 0.0007717970422745605,
869
+ "grad_norm": 0.6321675777435303,
870
+ "kl": 0.1748046875,
871
+ "learning_rate": 9.996141014788629e-06,
872
+ "loss": 0.007,
873
+ "reward": 1.6298596262931824,
874
+ "reward_std": 0.21304509788751602,
875
+ "rewards/format_reward_rec": 0.96875,
876
+ "rewards/iou_reward": 0.6611096560955048,
877
+ "step": 62
878
+ },
879
+ {
880
+ "clip_ratio": 0.0,
881
+ "completion_length": 93.71875,
882
+ "epoch": 0.0007842453816660857,
883
+ "grad_norm": 0.5161328911781311,
884
+ "kl": 0.1357421875,
885
+ "learning_rate": 9.996078773091671e-06,
886
+ "loss": 0.0054,
887
+ "reward": 1.7697656154632568,
888
+ "reward_std": 0.15015805140137672,
889
+ "rewards/format_reward_rec": 1.0,
890
+ "rewards/iou_reward": 0.769765704870224,
891
+ "step": 63
892
+ },
893
+ {
894
+ "clip_ratio": 0.0,
895
+ "completion_length": 78.28125,
896
+ "epoch": 0.0007966937210576109,
897
+ "grad_norm": 0.5633348822593689,
898
+ "kl": 0.1611328125,
899
+ "learning_rate": 9.996016531394713e-06,
900
+ "loss": 0.0064,
901
+ "reward": 1.4602358937263489,
902
+ "reward_std": 0.115456972271204,
903
+ "rewards/format_reward_rec": 1.0,
904
+ "rewards/iou_reward": 0.4602358639240265,
905
+ "step": 64
906
+ },
907
+ {
908
+ "clip_ratio": 0.0,
909
+ "completion_length": 87.46875,
910
+ "epoch": 0.000809142060449136,
911
+ "grad_norm": 0.6145336031913757,
912
+ "kl": 0.12646484375,
913
+ "learning_rate": 9.995954289697755e-06,
914
+ "loss": 0.0051,
915
+ "reward": 1.716072678565979,
916
+ "reward_std": 0.27911266684532166,
917
+ "rewards/format_reward_rec": 0.96875,
918
+ "rewards/iou_reward": 0.747322678565979,
919
+ "step": 65
920
+ },
921
+ {
922
+ "clip_ratio": 0.0,
923
+ "completion_length": 89.6875,
924
+ "epoch": 0.0008215903998406612,
925
+ "grad_norm": 0.5064393281936646,
926
+ "kl": 0.115966796875,
927
+ "learning_rate": 9.995892048000796e-06,
928
+ "loss": 0.0046,
929
+ "reward": 1.5941734313964844,
930
+ "reward_std": 0.1153794713318348,
931
+ "rewards/format_reward_rec": 1.0,
932
+ "rewards/iou_reward": 0.5941733419895172,
933
+ "step": 66
934
+ },
935
+ {
936
+ "clip_ratio": 0.0,
937
+ "completion_length": 80.34375,
938
+ "epoch": 0.0008340387392321865,
939
+ "grad_norm": 0.502927303314209,
940
+ "kl": 0.140625,
941
+ "learning_rate": 9.99582980630384e-06,
942
+ "loss": 0.0056,
943
+ "reward": 1.737270712852478,
944
+ "reward_std": 0.09697960317134857,
945
+ "rewards/format_reward_rec": 1.0,
946
+ "rewards/iou_reward": 0.7372707426548004,
947
+ "step": 67
948
+ },
949
+ {
950
+ "clip_ratio": 0.0,
951
+ "completion_length": 79.75,
952
+ "epoch": 0.0008464870786237116,
953
+ "grad_norm": 0.5995090007781982,
954
+ "kl": 0.1298828125,
955
+ "learning_rate": 9.995767564606882e-06,
956
+ "loss": 0.0052,
957
+ "reward": 1.6859315037727356,
958
+ "reward_std": 0.10032923426479101,
959
+ "rewards/format_reward_rec": 1.0,
960
+ "rewards/iou_reward": 0.6859315186738968,
961
+ "step": 68
962
+ },
963
+ {
964
+ "clip_ratio": 0.0,
965
+ "completion_length": 91.15625,
966
+ "epoch": 0.0008589354180152368,
967
+ "grad_norm": 0.4770706295967102,
968
+ "kl": 0.10986328125,
969
+ "learning_rate": 9.995705322909924e-06,
970
+ "loss": 0.0044,
971
+ "reward": 1.5923967361450195,
972
+ "reward_std": 0.17281262017786503,
973
+ "rewards/format_reward_rec": 1.0,
974
+ "rewards/iou_reward": 0.5923967510461807,
975
+ "step": 69
976
+ },
977
+ {
978
+ "clip_ratio": 0.0,
979
+ "completion_length": 84.3125,
980
+ "epoch": 0.000871383757406762,
981
+ "grad_norm": 0.5044773817062378,
982
+ "kl": 0.1328125,
983
+ "learning_rate": 9.995643081212967e-06,
984
+ "loss": 0.0053,
985
+ "reward": 1.5754359364509583,
986
+ "reward_std": 0.17663145437836647,
987
+ "rewards/format_reward_rec": 1.0,
988
+ "rewards/iou_reward": 0.5754359662532806,
989
+ "step": 70
990
+ },
991
+ {
992
+ "clip_ratio": 0.0,
993
+ "completion_length": 69.8125,
994
+ "epoch": 0.0008838320967982871,
995
+ "grad_norm": 0.5421180725097656,
996
+ "kl": 0.14794921875,
997
+ "learning_rate": 9.99558083951601e-06,
998
+ "loss": 0.0059,
999
+ "reward": 1.800459086894989,
1000
+ "reward_std": 0.07165651768445969,
1001
+ "rewards/format_reward_rec": 1.0,
1002
+ "rewards/iou_reward": 0.8004591166973114,
1003
+ "step": 71
1004
+ },
1005
+ {
1006
+ "clip_ratio": 0.0,
1007
+ "completion_length": 84.65625,
1008
+ "epoch": 0.0008962804361898123,
1009
+ "grad_norm": 0.5785159468650818,
1010
+ "kl": 0.1337890625,
1011
+ "learning_rate": 9.995518597819051e-06,
1012
+ "loss": 0.0054,
1013
+ "reward": 1.6627878546714783,
1014
+ "reward_std": 0.26127464324235916,
1015
+ "rewards/format_reward_rec": 1.0,
1016
+ "rewards/iou_reward": 0.6627878248691559,
1017
+ "step": 72
1018
+ },
1019
+ {
1020
+ "clip_ratio": 0.0,
1021
+ "completion_length": 73.625,
1022
+ "epoch": 0.0009087287755813375,
1023
+ "grad_norm": 0.5469045042991638,
1024
+ "kl": 0.11474609375,
1025
+ "learning_rate": 9.995456356122095e-06,
1026
+ "loss": 0.0046,
1027
+ "reward": 1.7520496249198914,
1028
+ "reward_std": 0.15470531303435564,
1029
+ "rewards/format_reward_rec": 1.0,
1030
+ "rewards/iou_reward": 0.7520496249198914,
1031
+ "step": 73
1032
+ },
1033
+ {
1034
+ "clip_ratio": 0.0,
1035
+ "completion_length": 72.15625,
1036
+ "epoch": 0.0009211771149728626,
1037
+ "grad_norm": 0.5782791972160339,
1038
+ "kl": 0.15185546875,
1039
+ "learning_rate": 9.995394114425137e-06,
1040
+ "loss": 0.0061,
1041
+ "reward": 1.6950392723083496,
1042
+ "reward_std": 0.20423278957605362,
1043
+ "rewards/format_reward_rec": 0.96875,
1044
+ "rewards/iou_reward": 0.726289302110672,
1045
+ "step": 74
1046
+ },
1047
+ {
1048
+ "clip_ratio": 0.0,
1049
+ "completion_length": 70.3125,
1050
+ "epoch": 0.0009336254543643878,
1051
+ "grad_norm": 0.5352646112442017,
1052
+ "kl": 0.1396484375,
1053
+ "learning_rate": 9.995331872728178e-06,
1054
+ "loss": 0.0056,
1055
+ "reward": 1.8100340962409973,
1056
+ "reward_std": 0.07194224745035172,
1057
+ "rewards/format_reward_rec": 1.0,
1058
+ "rewards/iou_reward": 0.8100341260433197,
1059
+ "step": 75
1060
+ },
1061
+ {
1062
+ "clip_ratio": 0.0,
1063
+ "completion_length": 73.625,
1064
+ "epoch": 0.000946073793755913,
1065
+ "grad_norm": 0.5901323556900024,
1066
+ "kl": 0.1396484375,
1067
+ "learning_rate": 9.995269631031222e-06,
1068
+ "loss": 0.0056,
1069
+ "reward": 1.9025427103042603,
1070
+ "reward_std": 0.09781728684902191,
1071
+ "rewards/format_reward_rec": 1.0,
1072
+ "rewards/iou_reward": 0.902542769908905,
1073
+ "step": 76
1074
+ },
1075
+ {
1076
+ "clip_ratio": 0.0,
1077
+ "completion_length": 76.0625,
1078
+ "epoch": 0.0009585221331474381,
1079
+ "grad_norm": 0.5916361212730408,
1080
+ "kl": 0.14111328125,
1081
+ "learning_rate": 9.995207389334264e-06,
1082
+ "loss": 0.0056,
1083
+ "reward": 1.637428343296051,
1084
+ "reward_std": 0.2861202023923397,
1085
+ "rewards/format_reward_rec": 1.0,
1086
+ "rewards/iou_reward": 0.6374283134937286,
1087
+ "step": 77
1088
+ },
1089
+ {
1090
+ "clip_ratio": 0.0,
1091
+ "completion_length": 78.0,
1092
+ "epoch": 0.0009709704725389633,
1093
+ "grad_norm": 0.534542441368103,
1094
+ "kl": 0.12890625,
1095
+ "learning_rate": 9.995145147637306e-06,
1096
+ "loss": 0.0052,
1097
+ "reward": 1.5602301359176636,
1098
+ "reward_std": 0.26418060809373856,
1099
+ "rewards/format_reward_rec": 1.0,
1100
+ "rewards/iou_reward": 0.560230165719986,
1101
+ "step": 78
1102
+ },
1103
+ {
1104
+ "clip_ratio": 0.0,
1105
+ "completion_length": 80.78125,
1106
+ "epoch": 0.0009834188119304886,
1107
+ "grad_norm": 0.5377347469329834,
1108
+ "kl": 0.1337890625,
1109
+ "learning_rate": 9.99508290594035e-06,
1110
+ "loss": 0.0054,
1111
+ "reward": 1.7412559986114502,
1112
+ "reward_std": 0.19341769814491272,
1113
+ "rewards/format_reward_rec": 1.0,
1114
+ "rewards/iou_reward": 0.7412559390068054,
1115
+ "step": 79
1116
+ },
1117
+ {
1118
+ "clip_ratio": 0.0,
1119
+ "completion_length": 83.09375,
1120
+ "epoch": 0.0009958671513220136,
1121
+ "grad_norm": 0.5326669812202454,
1122
+ "kl": 0.1416015625,
1123
+ "learning_rate": 9.995020664243391e-06,
1124
+ "loss": 0.0057,
1125
+ "reward": 1.6354157328605652,
1126
+ "reward_std": 0.09996325895190239,
1127
+ "rewards/format_reward_rec": 1.0,
1128
+ "rewards/iou_reward": 0.6354157030582428,
1129
+ "step": 80
1130
+ },
1131
+ {
1132
+ "clip_ratio": 0.0,
1133
+ "completion_length": 74.375,
1134
+ "epoch": 0.001008315490713539,
1135
+ "grad_norm": 0.5222757458686829,
1136
+ "kl": 0.14599609375,
1137
+ "learning_rate": 9.994958422546433e-06,
1138
+ "loss": 0.0058,
1139
+ "reward": 1.8184998631477356,
1140
+ "reward_std": 0.21548935770988464,
1141
+ "rewards/format_reward_rec": 1.0,
1142
+ "rewards/iou_reward": 0.8184998333454132,
1143
+ "step": 81
1144
+ },
1145
+ {
1146
+ "clip_ratio": 0.0,
1147
+ "completion_length": 76.90625,
1148
+ "epoch": 0.001020763830105064,
1149
+ "grad_norm": 0.5388038158416748,
1150
+ "kl": 0.158203125,
1151
+ "learning_rate": 9.994896180849477e-06,
1152
+ "loss": 0.0063,
1153
+ "reward": 1.760369062423706,
1154
+ "reward_std": 0.20370811223983765,
1155
+ "rewards/format_reward_rec": 1.0,
1156
+ "rewards/iou_reward": 0.7603690922260284,
1157
+ "step": 82
1158
+ },
1159
+ {
1160
+ "clip_ratio": 0.0,
1161
+ "completion_length": 77.96875,
1162
+ "epoch": 0.0010332121694965892,
1163
+ "grad_norm": 0.6683233380317688,
1164
+ "kl": 0.14599609375,
1165
+ "learning_rate": 9.994833939152517e-06,
1166
+ "loss": 0.0058,
1167
+ "reward": 1.6503952145576477,
1168
+ "reward_std": 0.28510017693042755,
1169
+ "rewards/format_reward_rec": 0.96875,
1170
+ "rewards/iou_reward": 0.6816451847553253,
1171
+ "step": 83
1172
+ },
1173
+ {
1174
+ "clip_ratio": 0.0,
1175
+ "completion_length": 76.5,
1176
+ "epoch": 0.0010456605088881143,
1177
+ "grad_norm": 0.8178681135177612,
1178
+ "kl": 0.17626953125,
1179
+ "learning_rate": 9.99477169745556e-06,
1180
+ "loss": 0.007,
1181
+ "reward": 1.8033937811851501,
1182
+ "reward_std": 0.11837778985500336,
1183
+ "rewards/format_reward_rec": 1.0,
1184
+ "rewards/iou_reward": 0.8033937513828278,
1185
+ "step": 84
1186
+ },
1187
+ {
1188
+ "clip_ratio": 0.0,
1189
+ "completion_length": 75.4375,
1190
+ "epoch": 0.0010581088482796396,
1191
+ "grad_norm": 0.6322414875030518,
1192
+ "kl": 0.1943359375,
1193
+ "learning_rate": 9.994709455758602e-06,
1194
+ "loss": 0.0078,
1195
+ "reward": 1.789516270160675,
1196
+ "reward_std": 0.1459340425208211,
1197
+ "rewards/format_reward_rec": 0.96875,
1198
+ "rewards/iou_reward": 0.8207662999629974,
1199
+ "step": 85
1200
+ },
1201
+ {
1202
+ "clip_ratio": 0.0,
1203
+ "completion_length": 73.625,
1204
+ "epoch": 0.0010705571876711646,
1205
+ "grad_norm": 0.7803683876991272,
1206
+ "kl": 0.1806640625,
1207
+ "learning_rate": 9.994647214061644e-06,
1208
+ "loss": 0.0072,
1209
+ "reward": 1.580109417438507,
1210
+ "reward_std": 0.17563226073980331,
1211
+ "rewards/format_reward_rec": 1.0,
1212
+ "rewards/iou_reward": 0.5801093876361847,
1213
+ "step": 86
1214
+ },
1215
+ {
1216
+ "clip_ratio": 0.0,
1217
+ "completion_length": 68.375,
1218
+ "epoch": 0.0010830055270626899,
1219
+ "grad_norm": 0.497260719537735,
1220
+ "kl": 0.14697265625,
1221
+ "learning_rate": 9.994584972364688e-06,
1222
+ "loss": 0.0059,
1223
+ "reward": 1.6402252912521362,
1224
+ "reward_std": 0.11211172491312027,
1225
+ "rewards/format_reward_rec": 1.0,
1226
+ "rewards/iou_reward": 0.6402253359556198,
1227
+ "step": 87
1228
+ },
1229
+ {
1230
+ "clip_ratio": 0.0,
1231
+ "completion_length": 76.375,
1232
+ "epoch": 0.001095453866454215,
1233
+ "grad_norm": 0.5990146398544312,
1234
+ "kl": 0.17041015625,
1235
+ "learning_rate": 9.99452273066773e-06,
1236
+ "loss": 0.0068,
1237
+ "reward": 1.7737177610397339,
1238
+ "reward_std": 0.11195245012640953,
1239
+ "rewards/format_reward_rec": 1.0,
1240
+ "rewards/iou_reward": 0.7737177312374115,
1241
+ "step": 88
1242
+ },
1243
+ {
1244
+ "clip_ratio": 0.0,
1245
+ "completion_length": 90.3125,
1246
+ "epoch": 0.0011079022058457402,
1247
+ "grad_norm": 0.579544186592102,
1248
+ "kl": 0.14794921875,
1249
+ "learning_rate": 9.994460488970771e-06,
1250
+ "loss": 0.0059,
1251
+ "reward": 1.4342612028121948,
1252
+ "reward_std": 0.22582250833511353,
1253
+ "rewards/format_reward_rec": 0.96875,
1254
+ "rewards/iou_reward": 0.465511217713356,
1255
+ "step": 89
1256
+ },
1257
+ {
1258
+ "clip_ratio": 0.0,
1259
+ "completion_length": 71.5625,
1260
+ "epoch": 0.0011203505452372653,
1261
+ "grad_norm": 0.626300573348999,
1262
+ "kl": 0.15185546875,
1263
+ "learning_rate": 9.994398247273815e-06,
1264
+ "loss": 0.0061,
1265
+ "reward": 1.830142080783844,
1266
+ "reward_std": 0.09023091197013855,
1267
+ "rewards/format_reward_rec": 1.0,
1268
+ "rewards/iou_reward": 0.830142080783844,
1269
+ "step": 90
1270
+ },
1271
+ {
1272
+ "clip_ratio": 0.0,
1273
+ "completion_length": 70.4375,
1274
+ "epoch": 0.0011327988846287905,
1275
+ "grad_norm": 0.530746340751648,
1276
+ "kl": 0.162109375,
1277
+ "learning_rate": 9.994336005576857e-06,
1278
+ "loss": 0.0065,
1279
+ "reward": 1.7541658282279968,
1280
+ "reward_std": 0.22513454407453537,
1281
+ "rewards/format_reward_rec": 1.0,
1282
+ "rewards/iou_reward": 0.754165768623352,
1283
+ "step": 91
1284
+ },
1285
+ {
1286
+ "clip_ratio": 0.0,
1287
+ "completion_length": 80.75,
1288
+ "epoch": 0.0011452472240203156,
1289
+ "grad_norm": 0.729465126991272,
1290
+ "kl": 0.1552734375,
1291
+ "learning_rate": 9.994273763879899e-06,
1292
+ "loss": 0.0062,
1293
+ "reward": 1.4567736983299255,
1294
+ "reward_std": 0.3039446175098419,
1295
+ "rewards/format_reward_rec": 0.96875,
1296
+ "rewards/iou_reward": 0.48802371323108673,
1297
+ "step": 92
1298
+ },
1299
+ {
1300
+ "clip_ratio": 0.0,
1301
+ "completion_length": 77.03125,
1302
+ "epoch": 0.0011576955634118409,
1303
+ "grad_norm": 0.747829020023346,
1304
+ "kl": 0.19482421875,
1305
+ "learning_rate": 9.994211522182942e-06,
1306
+ "loss": 0.0078,
1307
+ "reward": 1.6674769520759583,
1308
+ "reward_std": 0.36814084649086,
1309
+ "rewards/format_reward_rec": 0.9375,
1310
+ "rewards/iou_reward": 0.7299769520759583,
1311
+ "step": 93
1312
+ },
1313
+ {
1314
+ "clip_ratio": 0.0,
1315
+ "completion_length": 73.1875,
1316
+ "epoch": 0.001170143902803366,
1317
+ "grad_norm": 0.7084731459617615,
1318
+ "kl": 0.14501953125,
1319
+ "learning_rate": 9.994149280485984e-06,
1320
+ "loss": 0.0058,
1321
+ "reward": 1.8480735421180725,
1322
+ "reward_std": 0.19576817378401756,
1323
+ "rewards/format_reward_rec": 0.96875,
1324
+ "rewards/iou_reward": 0.8793235719203949,
1325
+ "step": 94
1326
+ },
1327
+ {
1328
+ "clip_ratio": 0.0,
1329
+ "completion_length": 80.0,
1330
+ "epoch": 0.0011825922421948912,
1331
+ "grad_norm": 0.6167282462120056,
1332
+ "kl": 0.14404296875,
1333
+ "learning_rate": 9.994087038789026e-06,
1334
+ "loss": 0.0058,
1335
+ "reward": 1.7454318404197693,
1336
+ "reward_std": 0.0950808608904481,
1337
+ "rewards/format_reward_rec": 1.0,
1338
+ "rewards/iou_reward": 0.7454318404197693,
1339
+ "step": 95
1340
+ },
1341
+ {
1342
+ "clip_ratio": 0.0,
1343
+ "completion_length": 85.375,
1344
+ "epoch": 0.0011950405815864165,
1345
+ "grad_norm": 0.6630386114120483,
1346
+ "kl": 0.1318359375,
1347
+ "learning_rate": 9.99402479709207e-06,
1348
+ "loss": 0.0053,
1349
+ "reward": 1.800285816192627,
1350
+ "reward_std": 0.22147530317306519,
1351
+ "rewards/format_reward_rec": 1.0,
1352
+ "rewards/iou_reward": 0.8002857863903046,
1353
+ "step": 96
1354
+ },
1355
+ {
1356
+ "clip_ratio": 0.0,
1357
+ "completion_length": 76.5,
1358
+ "epoch": 0.0012074889209779415,
1359
+ "grad_norm": 0.6420366168022156,
1360
+ "kl": 0.166015625,
1361
+ "learning_rate": 9.993962555395111e-06,
1362
+ "loss": 0.0066,
1363
+ "reward": 1.8380752205848694,
1364
+ "reward_std": 0.15962522476911545,
1365
+ "rewards/format_reward_rec": 1.0,
1366
+ "rewards/iou_reward": 0.8380752801895142,
1367
+ "step": 97
1368
+ },
1369
+ {
1370
+ "clip_ratio": 0.0,
1371
+ "completion_length": 81.28125,
1372
+ "epoch": 0.0012199372603694668,
1373
+ "grad_norm": 0.5716676115989685,
1374
+ "kl": 0.140625,
1375
+ "learning_rate": 9.993900313698153e-06,
1376
+ "loss": 0.0056,
1377
+ "reward": 1.7347316145896912,
1378
+ "reward_std": 0.14502982422709465,
1379
+ "rewards/format_reward_rec": 1.0,
1380
+ "rewards/iou_reward": 0.7347316443920135,
1381
+ "step": 98
1382
+ },
1383
+ {
1384
+ "clip_ratio": 0.0,
1385
+ "completion_length": 77.40625,
1386
+ "epoch": 0.0012323855997609919,
1387
+ "grad_norm": 0.6021421551704407,
1388
+ "kl": 0.13916015625,
1389
+ "learning_rate": 9.993838072001197e-06,
1390
+ "loss": 0.0056,
1391
+ "reward": 1.6757659316062927,
1392
+ "reward_std": 0.21641412377357483,
1393
+ "rewards/format_reward_rec": 1.0,
1394
+ "rewards/iou_reward": 0.6757659018039703,
1395
+ "step": 99
1396
+ },
1397
+ {
1398
+ "clip_ratio": 0.0,
1399
+ "completion_length": 73.84375,
1400
+ "epoch": 0.0012448339391525171,
1401
+ "grad_norm": 0.5074762105941772,
1402
+ "kl": 0.1884765625,
1403
+ "learning_rate": 9.993775830304239e-06,
1404
+ "loss": 0.0075,
1405
+ "reward": 1.7514841556549072,
1406
+ "reward_std": 0.17918928153812885,
1407
+ "rewards/format_reward_rec": 0.96875,
1408
+ "rewards/iou_reward": 0.7827341556549072,
1409
+ "step": 100
1410
+ },
1411
+ {
1412
+ "clip_ratio": 0.0,
1413
+ "completion_length": 73.125,
1414
+ "epoch": 0.0012572822785440422,
1415
+ "grad_norm": 0.5177865624427795,
1416
+ "kl": 0.128662109375,
1417
+ "learning_rate": 9.99371358860728e-06,
1418
+ "loss": 0.0051,
1419
+ "reward": 1.786689281463623,
1420
+ "reward_std": 0.14574366062879562,
1421
+ "rewards/format_reward_rec": 1.0,
1422
+ "rewards/iou_reward": 0.7866893112659454,
1423
+ "step": 101
1424
+ },
1425
+ {
1426
+ "clip_ratio": 0.0,
1427
+ "completion_length": 75.65625,
1428
+ "epoch": 0.0012697306179355675,
1429
+ "grad_norm": 0.5310884714126587,
1430
+ "kl": 0.1455078125,
1431
+ "learning_rate": 9.993651346910322e-06,
1432
+ "loss": 0.0058,
1433
+ "reward": 1.7609416246414185,
1434
+ "reward_std": 0.16141251474618912,
1435
+ "rewards/format_reward_rec": 1.0,
1436
+ "rewards/iou_reward": 0.7609416246414185,
1437
+ "step": 102
1438
+ },
1439
+ {
1440
+ "clip_ratio": 0.0,
1441
+ "completion_length": 74.28125,
1442
+ "epoch": 0.0012821789573270925,
1443
+ "grad_norm": 0.6390679478645325,
1444
+ "kl": 0.15283203125,
1445
+ "learning_rate": 9.993589105213364e-06,
1446
+ "loss": 0.0061,
1447
+ "reward": 1.894408941268921,
1448
+ "reward_std": 0.09809810575097799,
1449
+ "rewards/format_reward_rec": 1.0,
1450
+ "rewards/iou_reward": 0.8944090008735657,
1451
+ "step": 103
1452
+ },
1453
+ {
1454
+ "clip_ratio": 0.0,
1455
+ "completion_length": 78.46875,
1456
+ "epoch": 0.0012946272967186178,
1457
+ "grad_norm": 0.6070464849472046,
1458
+ "kl": 0.173828125,
1459
+ "learning_rate": 9.993526863516408e-06,
1460
+ "loss": 0.007,
1461
+ "reward": 1.7859740853309631,
1462
+ "reward_std": 0.11464018002152443,
1463
+ "rewards/format_reward_rec": 1.0,
1464
+ "rewards/iou_reward": 0.7859741151332855,
1465
+ "step": 104
1466
+ },
1467
+ {
1468
+ "clip_ratio": 0.0,
1469
+ "completion_length": 87.875,
1470
+ "epoch": 0.0013070756361101428,
1471
+ "grad_norm": 0.5264771580696106,
1472
+ "kl": 0.116943359375,
1473
+ "learning_rate": 9.99346462181945e-06,
1474
+ "loss": 0.0047,
1475
+ "reward": 1.7116284370422363,
1476
+ "reward_std": 0.19027460366487503,
1477
+ "rewards/format_reward_rec": 1.0,
1478
+ "rewards/iou_reward": 0.7116283774375916,
1479
+ "step": 105
1480
+ },
1481
+ {
1482
+ "clip_ratio": 0.0,
1483
+ "completion_length": 75.46875,
1484
+ "epoch": 0.0013195239755016681,
1485
+ "grad_norm": 0.5975978970527649,
1486
+ "kl": 0.15576171875,
1487
+ "learning_rate": 9.993402380122492e-06,
1488
+ "loss": 0.0062,
1489
+ "reward": 1.693192720413208,
1490
+ "reward_std": 0.23123470693826675,
1491
+ "rewards/format_reward_rec": 1.0,
1492
+ "rewards/iou_reward": 0.6931926906108856,
1493
+ "step": 106
1494
+ },
1495
+ {
1496
+ "clip_ratio": 0.0,
1497
+ "completion_length": 83.96875,
1498
+ "epoch": 0.0013319723148931932,
1499
+ "grad_norm": 0.49909281730651855,
1500
+ "kl": 0.150390625,
1501
+ "learning_rate": 9.993340138425535e-06,
1502
+ "loss": 0.006,
1503
+ "reward": 1.4939433932304382,
1504
+ "reward_std": 0.23037827014923096,
1505
+ "rewards/format_reward_rec": 1.0,
1506
+ "rewards/iou_reward": 0.49394336342811584,
1507
+ "step": 107
1508
+ },
1509
+ {
1510
+ "clip_ratio": 0.0,
1511
+ "completion_length": 77.71875,
1512
+ "epoch": 0.0013444206542847185,
1513
+ "grad_norm": 0.5423364639282227,
1514
+ "kl": 0.15478515625,
1515
+ "learning_rate": 9.993277896728577e-06,
1516
+ "loss": 0.0062,
1517
+ "reward": 1.6659827828407288,
1518
+ "reward_std": 0.1452256366610527,
1519
+ "rewards/format_reward_rec": 1.0,
1520
+ "rewards/iou_reward": 0.6659828424453735,
1521
+ "step": 108
1522
+ },
1523
+ {
1524
+ "clip_ratio": 0.0,
1525
+ "completion_length": 80.03125,
1526
+ "epoch": 0.0013568689936762435,
1527
+ "grad_norm": 0.4949978291988373,
1528
+ "kl": 0.12939453125,
1529
+ "learning_rate": 9.993215655031619e-06,
1530
+ "loss": 0.0052,
1531
+ "reward": 1.6230355501174927,
1532
+ "reward_std": 0.18467745929956436,
1533
+ "rewards/format_reward_rec": 1.0,
1534
+ "rewards/iou_reward": 0.6230354905128479,
1535
+ "step": 109
1536
+ },
1537
+ {
1538
+ "clip_ratio": 0.0,
1539
+ "completion_length": 77.25,
1540
+ "epoch": 0.0013693173330677688,
1541
+ "grad_norm": 0.5784613490104675,
1542
+ "kl": 0.11962890625,
1543
+ "learning_rate": 9.993153413334662e-06,
1544
+ "loss": 0.0048,
1545
+ "reward": 1.7517021894454956,
1546
+ "reward_std": 0.15935295075178146,
1547
+ "rewards/format_reward_rec": 1.0,
1548
+ "rewards/iou_reward": 0.7517021894454956,
1549
+ "step": 110
1550
+ },
1551
+ {
1552
+ "clip_ratio": 0.0,
1553
+ "completion_length": 85.4375,
1554
+ "epoch": 0.0013817656724592938,
1555
+ "grad_norm": 0.4976905286312103,
1556
+ "kl": 0.154296875,
1557
+ "learning_rate": 9.993091171637704e-06,
1558
+ "loss": 0.0062,
1559
+ "reward": 1.6756169199943542,
1560
+ "reward_std": 0.16474511474370956,
1561
+ "rewards/format_reward_rec": 0.96875,
1562
+ "rewards/iou_reward": 0.7068668603897095,
1563
+ "step": 111
1564
+ },
1565
+ {
1566
+ "clip_ratio": 0.0,
1567
+ "completion_length": 86.40625,
1568
+ "epoch": 0.0013942140118508191,
1569
+ "grad_norm": 0.4809585511684418,
1570
+ "kl": 0.139404296875,
1571
+ "learning_rate": 9.993028929940746e-06,
1572
+ "loss": 0.0056,
1573
+ "reward": 1.6836587190628052,
1574
+ "reward_std": 0.16580590046942234,
1575
+ "rewards/format_reward_rec": 1.0,
1576
+ "rewards/iou_reward": 0.6836587190628052,
1577
+ "step": 112
1578
+ },
1579
+ {
1580
+ "clip_ratio": 0.0,
1581
+ "completion_length": 84.84375,
1582
+ "epoch": 0.0014066623512423442,
1583
+ "grad_norm": 0.5611812472343445,
1584
+ "kl": 0.12060546875,
1585
+ "learning_rate": 9.99296668824379e-06,
1586
+ "loss": 0.0048,
1587
+ "reward": 1.707568645477295,
1588
+ "reward_std": 0.4278259873390198,
1589
+ "rewards/format_reward_rec": 0.96875,
1590
+ "rewards/iou_reward": 0.7388187050819397,
1591
+ "step": 113
1592
+ },
1593
+ {
1594
+ "clip_ratio": 0.0,
1595
+ "completion_length": 81.625,
1596
+ "epoch": 0.0014191106906338694,
1597
+ "grad_norm": 0.5960983633995056,
1598
+ "kl": 0.16552734375,
1599
+ "learning_rate": 9.992904446546832e-06,
1600
+ "loss": 0.0066,
1601
+ "reward": 1.8632564544677734,
1602
+ "reward_std": 0.08219528943300247,
1603
+ "rewards/format_reward_rec": 1.0,
1604
+ "rewards/iou_reward": 0.8632563948631287,
1605
+ "step": 114
1606
+ },
1607
+ {
1608
+ "clip_ratio": 0.0,
1609
+ "completion_length": 77.46875,
1610
+ "epoch": 0.0014315590300253947,
1611
+ "grad_norm": 0.6228426098823547,
1612
+ "kl": 0.17138671875,
1613
+ "learning_rate": 9.992842204849874e-06,
1614
+ "loss": 0.0069,
1615
+ "reward": 1.8939569592475891,
1616
+ "reward_std": 0.05686322785913944,
1617
+ "rewards/format_reward_rec": 1.0,
1618
+ "rewards/iou_reward": 0.8939569890499115,
1619
+ "step": 115
1620
+ },
1621
+ {
1622
+ "clip_ratio": 0.0,
1623
+ "completion_length": 79.4375,
1624
+ "epoch": 0.0014440073694169198,
1625
+ "grad_norm": 0.5188966989517212,
1626
+ "kl": 0.15625,
1627
+ "learning_rate": 9.992779963152917e-06,
1628
+ "loss": 0.0063,
1629
+ "reward": 1.8855347633361816,
1630
+ "reward_std": 0.08866238407790661,
1631
+ "rewards/format_reward_rec": 1.0,
1632
+ "rewards/iou_reward": 0.8855347335338593,
1633
+ "step": 116
1634
+ },
1635
+ {
1636
+ "clip_ratio": 0.0,
1637
+ "completion_length": 76.25,
1638
+ "epoch": 0.001456455708808445,
1639
+ "grad_norm": 0.5806636214256287,
1640
+ "kl": 0.16650390625,
1641
+ "learning_rate": 9.992717721455959e-06,
1642
+ "loss": 0.0067,
1643
+ "reward": 1.6434407830238342,
1644
+ "reward_std": 0.1430330127477646,
1645
+ "rewards/format_reward_rec": 1.0,
1646
+ "rewards/iou_reward": 0.6434408128261566,
1647
+ "step": 117
1648
+ },
1649
+ {
1650
+ "clip_ratio": 0.0,
1651
+ "completion_length": 80.625,
1652
+ "epoch": 0.00146890404819997,
1653
+ "grad_norm": 0.5318806767463684,
1654
+ "kl": 0.15380859375,
1655
+ "learning_rate": 9.992655479759e-06,
1656
+ "loss": 0.0061,
1657
+ "reward": 1.5443456172943115,
1658
+ "reward_std": 0.2518259510397911,
1659
+ "rewards/format_reward_rec": 1.0,
1660
+ "rewards/iou_reward": 0.5443456172943115,
1661
+ "step": 118
1662
+ },
1663
+ {
1664
+ "clip_ratio": 0.0,
1665
+ "completion_length": 86.0,
1666
+ "epoch": 0.0014813523875914954,
1667
+ "grad_norm": 0.4761650264263153,
1668
+ "kl": 0.1474609375,
1669
+ "learning_rate": 9.992593238062044e-06,
1670
+ "loss": 0.0059,
1671
+ "reward": 1.7887638807296753,
1672
+ "reward_std": 0.11934101954102516,
1673
+ "rewards/format_reward_rec": 1.0,
1674
+ "rewards/iou_reward": 0.7887638509273529,
1675
+ "step": 119
1676
+ },
1677
+ {
1678
+ "clip_ratio": 0.0,
1679
+ "completion_length": 87.875,
1680
+ "epoch": 0.0014938007269830204,
1681
+ "grad_norm": 0.5443623661994934,
1682
+ "kl": 0.15478515625,
1683
+ "learning_rate": 9.992530996365085e-06,
1684
+ "loss": 0.0062,
1685
+ "reward": 1.6878383159637451,
1686
+ "reward_std": 0.29080618917942047,
1687
+ "rewards/format_reward_rec": 0.96875,
1688
+ "rewards/iou_reward": 0.7190883457660675,
1689
+ "step": 120
1690
+ },
1691
+ {
1692
+ "clip_ratio": 0.0,
1693
+ "completion_length": 75.0625,
1694
+ "epoch": 0.0015062490663745457,
1695
+ "grad_norm": 0.5305638313293457,
1696
+ "kl": 0.1748046875,
1697
+ "learning_rate": 9.992468754668128e-06,
1698
+ "loss": 0.007,
1699
+ "reward": 1.6585879921913147,
1700
+ "reward_std": 0.08734836708754301,
1701
+ "rewards/format_reward_rec": 1.0,
1702
+ "rewards/iou_reward": 0.6585879623889923,
1703
+ "step": 121
1704
+ },
1705
+ {
1706
+ "clip_ratio": 0.0,
1707
+ "completion_length": 81.84375,
1708
+ "epoch": 0.0015186974057660708,
1709
+ "grad_norm": 0.5158679485321045,
1710
+ "kl": 0.1240234375,
1711
+ "learning_rate": 9.99240651297117e-06,
1712
+ "loss": 0.0049,
1713
+ "reward": 1.8771040439605713,
1714
+ "reward_std": 0.07788556441664696,
1715
+ "rewards/format_reward_rec": 1.0,
1716
+ "rewards/iou_reward": 0.8771041035652161,
1717
+ "step": 122
1718
+ },
1719
+ {
1720
+ "clip_ratio": 0.0,
1721
+ "completion_length": 89.03125,
1722
+ "epoch": 0.001531145745157596,
1723
+ "grad_norm": 0.5314344167709351,
1724
+ "kl": 0.11328125,
1725
+ "learning_rate": 9.992344271274212e-06,
1726
+ "loss": 0.0045,
1727
+ "reward": 1.717368245124817,
1728
+ "reward_std": 0.12311140447854996,
1729
+ "rewards/format_reward_rec": 1.0,
1730
+ "rewards/iou_reward": 0.7173681855201721,
1731
+ "step": 123
1732
+ },
1733
+ {
1734
+ "clip_ratio": 0.0,
1735
+ "completion_length": 83.53125,
1736
+ "epoch": 0.001543594084549121,
1737
+ "grad_norm": 0.4630969762802124,
1738
+ "kl": 0.1318359375,
1739
+ "learning_rate": 9.992282029577255e-06,
1740
+ "loss": 0.0053,
1741
+ "reward": 1.5514342188835144,
1742
+ "reward_std": 0.2621281296014786,
1743
+ "rewards/format_reward_rec": 0.96875,
1744
+ "rewards/iou_reward": 0.5826842486858368,
1745
+ "step": 124
1746
+ },
1747
+ {
1748
+ "clip_ratio": 0.0,
1749
+ "completion_length": 82.25,
1750
+ "epoch": 0.0015560424239406464,
1751
+ "grad_norm": 0.5010308623313904,
1752
+ "kl": 0.11767578125,
1753
+ "learning_rate": 9.992219787880297e-06,
1754
+ "loss": 0.0047,
1755
+ "reward": 1.5915331840515137,
1756
+ "reward_std": 0.1749160811305046,
1757
+ "rewards/format_reward_rec": 1.0,
1758
+ "rewards/iou_reward": 0.5915331840515137,
1759
+ "step": 125
1760
+ },
1761
+ {
1762
+ "clip_ratio": 0.0,
1763
+ "completion_length": 73.125,
1764
+ "epoch": 0.0015684907633321714,
1765
+ "grad_norm": 0.4550737738609314,
1766
+ "kl": 0.139892578125,
1767
+ "learning_rate": 9.99215754618334e-06,
1768
+ "loss": 0.0056,
1769
+ "reward": 1.7983530759811401,
1770
+ "reward_std": 0.15214556828141212,
1771
+ "rewards/format_reward_rec": 1.0,
1772
+ "rewards/iou_reward": 0.7983531057834625,
1773
+ "step": 126
1774
+ },
1775
+ {
1776
+ "clip_ratio": 0.0,
1777
+ "completion_length": 80.53125,
1778
+ "epoch": 0.0015809391027236967,
1779
+ "grad_norm": 0.47265562415122986,
1780
+ "kl": 0.166015625,
1781
+ "learning_rate": 9.992095304486383e-06,
1782
+ "loss": 0.0066,
1783
+ "reward": 1.623586893081665,
1784
+ "reward_std": 0.0915629081428051,
1785
+ "rewards/format_reward_rec": 1.0,
1786
+ "rewards/iou_reward": 0.6235868632793427,
1787
+ "step": 127
1788
+ },
1789
+ {
1790
+ "clip_ratio": 0.0,
1791
+ "completion_length": 71.96875,
1792
+ "epoch": 0.0015933874421152217,
1793
+ "grad_norm": 0.5316770076751709,
1794
+ "kl": 0.12255859375,
1795
+ "learning_rate": 9.992033062789425e-06,
1796
+ "loss": 0.0049,
1797
+ "reward": 1.9262670278549194,
1798
+ "reward_std": 0.019816839136183262,
1799
+ "rewards/format_reward_rec": 1.0,
1800
+ "rewards/iou_reward": 0.926266998052597,
1801
+ "step": 128
1802
+ },
1803
+ {
1804
+ "clip_ratio": 0.0,
1805
+ "completion_length": 78.90625,
1806
+ "epoch": 0.001605835781506747,
1807
+ "grad_norm": 0.4828510582447052,
1808
+ "kl": 0.131103515625,
1809
+ "learning_rate": 9.991970821092466e-06,
1810
+ "loss": 0.0052,
1811
+ "reward": 1.5956175923347473,
1812
+ "reward_std": 0.11756857857108116,
1813
+ "rewards/format_reward_rec": 1.0,
1814
+ "rewards/iou_reward": 0.5956175774335861,
1815
+ "step": 129
1816
+ },
1817
+ {
1818
+ "clip_ratio": 0.0,
1819
+ "completion_length": 92.4375,
1820
+ "epoch": 0.001618284120898272,
1821
+ "grad_norm": 0.46744272112846375,
1822
+ "kl": 0.110107421875,
1823
+ "learning_rate": 9.99190857939551e-06,
1824
+ "loss": 0.0044,
1825
+ "reward": 1.9298859238624573,
1826
+ "reward_std": 0.036631692200899124,
1827
+ "rewards/format_reward_rec": 1.0,
1828
+ "rewards/iou_reward": 0.9298859238624573,
1829
+ "step": 130
1830
+ },
1831
+ {
1832
+ "clip_ratio": 0.0,
1833
+ "completion_length": 76.8125,
1834
+ "epoch": 0.0016307324602897974,
1835
+ "grad_norm": 0.5099858641624451,
1836
+ "kl": 0.150390625,
1837
+ "learning_rate": 9.991846337698552e-06,
1838
+ "loss": 0.006,
1839
+ "reward": 1.8364751935005188,
1840
+ "reward_std": 0.0698028914630413,
1841
+ "rewards/format_reward_rec": 1.0,
1842
+ "rewards/iou_reward": 0.8364751935005188,
1843
+ "step": 131
1844
+ },
1845
+ {
1846
+ "clip_ratio": 0.0,
1847
+ "completion_length": 83.0,
1848
+ "epoch": 0.0016431807996813224,
1849
+ "grad_norm": 0.5306047797203064,
1850
+ "kl": 0.151123046875,
1851
+ "learning_rate": 9.991784096001594e-06,
1852
+ "loss": 0.006,
1853
+ "reward": 1.7881332039833069,
1854
+ "reward_std": 0.25462284684181213,
1855
+ "rewards/format_reward_rec": 1.0,
1856
+ "rewards/iou_reward": 0.7881332635879517,
1857
+ "step": 132
1858
+ },
1859
+ {
1860
+ "clip_ratio": 0.0,
1861
+ "completion_length": 71.03125,
1862
+ "epoch": 0.0016556291390728477,
1863
+ "grad_norm": 0.5165503621101379,
1864
+ "kl": 0.1376953125,
1865
+ "learning_rate": 9.991721854304637e-06,
1866
+ "loss": 0.0055,
1867
+ "reward": 1.8669943809509277,
1868
+ "reward_std": 0.10201262310147285,
1869
+ "rewards/format_reward_rec": 1.0,
1870
+ "rewards/iou_reward": 0.8669944405555725,
1871
+ "step": 133
1872
+ },
1873
+ {
1874
+ "clip_ratio": 0.0,
1875
+ "completion_length": 80.46875,
1876
+ "epoch": 0.001668077478464373,
1877
+ "grad_norm": 0.5238383412361145,
1878
+ "kl": 0.125,
1879
+ "learning_rate": 9.99165961260768e-06,
1880
+ "loss": 0.005,
1881
+ "reward": 1.8666821718215942,
1882
+ "reward_std": 0.16252516955137253,
1883
+ "rewards/format_reward_rec": 0.96875,
1884
+ "rewards/iou_reward": 0.897932231426239,
1885
+ "step": 134
1886
+ },
1887
+ {
1888
+ "clip_ratio": 0.0,
1889
+ "completion_length": 86.96875,
1890
+ "epoch": 0.001680525817855898,
1891
+ "grad_norm": 3.0103187561035156,
1892
+ "kl": 0.369140625,
1893
+ "learning_rate": 9.991597370910721e-06,
1894
+ "loss": 0.0148,
1895
+ "reward": 1.6562628746032715,
1896
+ "reward_std": 0.061989203095436096,
1897
+ "rewards/format_reward_rec": 0.96875,
1898
+ "rewards/iou_reward": 0.6875128149986267,
1899
+ "step": 135
1900
+ },
1901
+ {
1902
+ "clip_ratio": 0.0,
1903
+ "completion_length": 77.96875,
1904
+ "epoch": 0.0016929741572474233,
1905
+ "grad_norm": 0.5688140988349915,
1906
+ "kl": 0.11865234375,
1907
+ "learning_rate": 9.991535129213765e-06,
1908
+ "loss": 0.0047,
1909
+ "reward": 1.6850191354751587,
1910
+ "reward_std": 0.1395672708749771,
1911
+ "rewards/format_reward_rec": 1.0,
1912
+ "rewards/iou_reward": 0.6850191354751587,
1913
+ "step": 136
1914
+ },
1915
+ {
1916
+ "clip_ratio": 0.0,
1917
+ "completion_length": 76.40625,
1918
+ "epoch": 0.0017054224966389483,
1919
+ "grad_norm": 0.5144866704940796,
1920
+ "kl": 0.1123046875,
1921
+ "learning_rate": 9.991472887516807e-06,
1922
+ "loss": 0.0045,
1923
+ "reward": 1.8065414428710938,
1924
+ "reward_std": 0.11821011267602444,
1925
+ "rewards/format_reward_rec": 1.0,
1926
+ "rewards/iou_reward": 0.8065415024757385,
1927
+ "step": 137
1928
+ },
1929
+ {
1930
+ "clip_ratio": 0.0,
1931
+ "completion_length": 78.59375,
1932
+ "epoch": 0.0017178708360304736,
1933
+ "grad_norm": 0.5437494516372681,
1934
+ "kl": 0.12744140625,
1935
+ "learning_rate": 9.991410645819848e-06,
1936
+ "loss": 0.0051,
1937
+ "reward": 1.751151978969574,
1938
+ "reward_std": 0.278888332657516,
1939
+ "rewards/format_reward_rec": 0.9375,
1940
+ "rewards/iou_reward": 0.813651978969574,
1941
+ "step": 138
1942
+ },
1943
+ {
1944
+ "clip_ratio": 0.0,
1945
+ "completion_length": 77.8125,
1946
+ "epoch": 0.0017303191754219987,
1947
+ "grad_norm": 0.5118846297264099,
1948
+ "kl": 0.13037109375,
1949
+ "learning_rate": 9.99134840412289e-06,
1950
+ "loss": 0.0052,
1951
+ "reward": 1.6839856505393982,
1952
+ "reward_std": 0.10992167890071869,
1953
+ "rewards/format_reward_rec": 0.96875,
1954
+ "rewards/iou_reward": 0.715235635638237,
1955
+ "step": 139
1956
+ },
1957
+ {
1958
+ "clip_ratio": 0.0,
1959
+ "completion_length": 92.78125,
1960
+ "epoch": 0.001742767514813524,
1961
+ "grad_norm": 0.6501355171203613,
1962
+ "kl": 0.1220703125,
1963
+ "learning_rate": 9.991286162425932e-06,
1964
+ "loss": 0.0049,
1965
+ "reward": 1.5675995349884033,
1966
+ "reward_std": 0.43148788809776306,
1967
+ "rewards/format_reward_rec": 0.9375,
1968
+ "rewards/iou_reward": 0.6300995051860809,
1969
+ "step": 140
1970
+ },
1971
+ {
1972
+ "clip_ratio": 0.0,
1973
+ "completion_length": 84.28125,
1974
+ "epoch": 0.001755215854205049,
1975
+ "grad_norm": 0.5102176666259766,
1976
+ "kl": 0.12353515625,
1977
+ "learning_rate": 9.991223920728976e-06,
1978
+ "loss": 0.0049,
1979
+ "reward": 1.8060396313667297,
1980
+ "reward_std": 0.142087172716856,
1981
+ "rewards/format_reward_rec": 1.0,
1982
+ "rewards/iou_reward": 0.8060396909713745,
1983
+ "step": 141
1984
+ },
1985
+ {
1986
+ "clip_ratio": 0.0,
1987
+ "completion_length": 87.34375,
1988
+ "epoch": 0.0017676641935965743,
1989
+ "grad_norm": 0.561132550239563,
1990
+ "kl": 0.138916015625,
1991
+ "learning_rate": 9.991161679032018e-06,
1992
+ "loss": 0.0055,
1993
+ "reward": 1.6661274433135986,
1994
+ "reward_std": 0.3482854291796684,
1995
+ "rewards/format_reward_rec": 0.96875,
1996
+ "rewards/iou_reward": 0.6973774135112762,
1997
+ "step": 142
1998
+ },
1999
+ {
2000
+ "clip_ratio": 0.0,
2001
+ "completion_length": 82.90625,
2002
+ "epoch": 0.0017801125329880993,
2003
+ "grad_norm": 0.555660605430603,
2004
+ "kl": 0.12939453125,
2005
+ "learning_rate": 9.99109943733506e-06,
2006
+ "loss": 0.0052,
2007
+ "reward": 1.7078312635421753,
2008
+ "reward_std": 0.2983357608318329,
2009
+ "rewards/format_reward_rec": 0.96875,
2010
+ "rewards/iou_reward": 0.7390812933444977,
2011
+ "step": 143
2012
+ },
2013
+ {
2014
+ "clip_ratio": 0.0,
2015
+ "completion_length": 86.15625,
2016
+ "epoch": 0.0017925608723796246,
2017
+ "grad_norm": 0.5917304754257202,
2018
+ "kl": 0.115966796875,
2019
+ "learning_rate": 9.991037195638103e-06,
2020
+ "loss": 0.0046,
2021
+ "reward": 1.6995146870613098,
2022
+ "reward_std": 0.176687341183424,
2023
+ "rewards/format_reward_rec": 1.0,
2024
+ "rewards/iou_reward": 0.6995146870613098,
2025
+ "step": 144
2026
+ },
2027
+ {
2028
+ "clip_ratio": 0.0,
2029
+ "completion_length": 76.5625,
2030
+ "epoch": 0.0018050092117711497,
2031
+ "grad_norm": 0.49189749360084534,
2032
+ "kl": 0.122314453125,
2033
+ "learning_rate": 9.990974953941145e-06,
2034
+ "loss": 0.0049,
2035
+ "reward": 1.8386791944503784,
2036
+ "reward_std": 0.11668524146080017,
2037
+ "rewards/format_reward_rec": 1.0,
2038
+ "rewards/iou_reward": 0.8386792242527008,
2039
+ "step": 145
2040
+ },
2041
+ {
2042
+ "clip_ratio": 0.0,
2043
+ "completion_length": 72.03125,
2044
+ "epoch": 0.001817457551162675,
2045
+ "grad_norm": 0.5640706419944763,
2046
+ "kl": 0.101318359375,
2047
+ "learning_rate": 9.990912712244187e-06,
2048
+ "loss": 0.0041,
2049
+ "reward": 1.6799483895301819,
2050
+ "reward_std": 0.19784042239189148,
2051
+ "rewards/format_reward_rec": 0.96875,
2052
+ "rewards/iou_reward": 0.7111983299255371,
2053
+ "step": 146
2054
+ },
2055
+ {
2056
+ "clip_ratio": 0.0,
2057
+ "completion_length": 82.75,
2058
+ "epoch": 0.0018299058905542,
2059
+ "grad_norm": 0.6259125471115112,
2060
+ "kl": 0.131103515625,
2061
+ "learning_rate": 9.99085047054723e-06,
2062
+ "loss": 0.0052,
2063
+ "reward": 1.6741862893104553,
2064
+ "reward_std": 0.05047000013291836,
2065
+ "rewards/format_reward_rec": 1.0,
2066
+ "rewards/iou_reward": 0.6741862595081329,
2067
+ "step": 147
2068
+ },
2069
+ {
2070
+ "clip_ratio": 0.0,
2071
+ "completion_length": 81.375,
2072
+ "epoch": 0.0018423542299457253,
2073
+ "grad_norm": 0.6438843011856079,
2074
+ "kl": 0.16552734375,
2075
+ "learning_rate": 9.990788228850272e-06,
2076
+ "loss": 0.0066,
2077
+ "reward": 1.6975774765014648,
2078
+ "reward_std": 0.3462005481123924,
2079
+ "rewards/format_reward_rec": 0.96875,
2080
+ "rewards/iou_reward": 0.7288274765014648,
2081
+ "step": 148
2082
+ },
2083
+ {
2084
+ "clip_ratio": 0.0,
2085
+ "completion_length": 82.5,
2086
+ "epoch": 0.0018548025693372503,
2087
+ "grad_norm": 0.5777101516723633,
2088
+ "kl": 0.12451171875,
2089
+ "learning_rate": 9.990725987153314e-06,
2090
+ "loss": 0.005,
2091
+ "reward": 1.7958465814590454,
2092
+ "reward_std": 0.29284024983644485,
2093
+ "rewards/format_reward_rec": 0.9375,
2094
+ "rewards/iou_reward": 0.8583464920520782,
2095
+ "step": 149
2096
+ },
2097
+ {
2098
+ "clip_ratio": 0.0,
2099
+ "completion_length": 77.9375,
2100
+ "epoch": 0.0018672509087287756,
2101
+ "grad_norm": 0.5476696491241455,
2102
+ "kl": 0.128662109375,
2103
+ "learning_rate": 9.990663745456358e-06,
2104
+ "loss": 0.0051,
2105
+ "reward": 1.7851470112800598,
2106
+ "reward_std": 0.15798081643879414,
2107
+ "rewards/format_reward_rec": 0.96875,
2108
+ "rewards/iou_reward": 0.8163970410823822,
2109
+ "step": 150
2110
+ },
2111
+ {
2112
+ "clip_ratio": 0.0,
2113
+ "completion_length": 80.25,
2114
+ "epoch": 0.0018796992481203006,
2115
+ "grad_norm": 0.48750340938568115,
2116
+ "kl": 0.122802734375,
2117
+ "learning_rate": 9.9906015037594e-06,
2118
+ "loss": 0.0049,
2119
+ "reward": 1.724117934703827,
2120
+ "reward_std": 0.17644132114946842,
2121
+ "rewards/format_reward_rec": 1.0,
2122
+ "rewards/iou_reward": 0.7241179645061493,
2123
+ "step": 151
2124
+ },
2125
+ {
2126
+ "clip_ratio": 0.0,
2127
+ "completion_length": 87.5,
2128
+ "epoch": 0.001892147587511826,
2129
+ "grad_norm": 0.6843100786209106,
2130
+ "kl": 0.1123046875,
2131
+ "learning_rate": 9.990539262062441e-06,
2132
+ "loss": 0.0045,
2133
+ "reward": 1.801917552947998,
2134
+ "reward_std": 0.1792924776673317,
2135
+ "rewards/format_reward_rec": 0.96875,
2136
+ "rewards/iou_reward": 0.8331675827503204,
2137
+ "step": 152
2138
+ },
2139
+ {
2140
+ "clip_ratio": 0.0,
2141
+ "completion_length": 89.0625,
2142
+ "epoch": 0.0019045959269033512,
2143
+ "grad_norm": 0.5321158766746521,
2144
+ "kl": 0.11376953125,
2145
+ "learning_rate": 9.990477020365485e-06,
2146
+ "loss": 0.0045,
2147
+ "reward": 1.631728708744049,
2148
+ "reward_std": 0.2040170580148697,
2149
+ "rewards/format_reward_rec": 0.96875,
2150
+ "rewards/iou_reward": 0.6629787236452103,
2151
+ "step": 153
2152
+ },
2153
+ {
2154
+ "clip_ratio": 0.0,
2155
+ "completion_length": 90.53125,
2156
+ "epoch": 0.0019170442662948763,
2157
+ "grad_norm": 0.5559011697769165,
2158
+ "kl": 0.100341796875,
2159
+ "learning_rate": 9.990414778668527e-06,
2160
+ "loss": 0.004,
2161
+ "reward": 1.7384355068206787,
2162
+ "reward_std": 0.3657010346651077,
2163
+ "rewards/format_reward_rec": 0.9375,
2164
+ "rewards/iou_reward": 0.8009354472160339,
2165
+ "step": 154
2166
+ },
2167
+ {
2168
+ "clip_ratio": 0.0,
2169
+ "completion_length": 80.8125,
2170
+ "epoch": 0.0019294926056864015,
2171
+ "grad_norm": 0.5985516905784607,
2172
+ "kl": 0.109619140625,
2173
+ "learning_rate": 9.990352536971569e-06,
2174
+ "loss": 0.0044,
2175
+ "reward": 1.7658506035804749,
2176
+ "reward_std": 0.1951538361608982,
2177
+ "rewards/format_reward_rec": 0.96875,
2178
+ "rewards/iou_reward": 0.7971006333827972,
2179
+ "step": 155
2180
+ },
2181
+ {
2182
+ "clip_ratio": 0.0,
2183
+ "completion_length": 91.375,
2184
+ "epoch": 0.0019419409450779266,
2185
+ "grad_norm": 0.5207005739212036,
2186
+ "kl": 0.124267578125,
2187
+ "learning_rate": 9.990290295274612e-06,
2188
+ "loss": 0.005,
2189
+ "reward": 1.6580237746238708,
2190
+ "reward_std": 0.20409102737903595,
2191
+ "rewards/format_reward_rec": 0.96875,
2192
+ "rewards/iou_reward": 0.6892738342285156,
2193
+ "step": 156
2194
+ },
2195
+ {
2196
+ "clip_ratio": 0.0,
2197
+ "completion_length": 86.40625,
2198
+ "epoch": 0.001954389284469452,
2199
+ "grad_norm": 0.614124059677124,
2200
+ "kl": 0.11474609375,
2201
+ "learning_rate": 9.990228053577652e-06,
2202
+ "loss": 0.0046,
2203
+ "reward": 1.6734708547592163,
2204
+ "reward_std": 0.24323870986700058,
2205
+ "rewards/format_reward_rec": 1.0,
2206
+ "rewards/iou_reward": 0.6734707951545715,
2207
+ "step": 157
2208
+ },
2209
+ {
2210
+ "clip_ratio": 0.0,
2211
+ "completion_length": 99.1875,
2212
+ "epoch": 0.001966837623860977,
2213
+ "grad_norm": 0.5424925684928894,
2214
+ "kl": 0.098388671875,
2215
+ "learning_rate": 9.990165811880696e-06,
2216
+ "loss": 0.0039,
2217
+ "reward": 1.6296205520629883,
2218
+ "reward_std": 0.20492691174149513,
2219
+ "rewards/format_reward_rec": 0.96875,
2220
+ "rewards/iou_reward": 0.6608706414699554,
2221
+ "step": 158
2222
+ },
2223
+ {
2224
+ "clip_ratio": 0.0,
2225
+ "completion_length": 84.25,
2226
+ "epoch": 0.001979285963252502,
2227
+ "grad_norm": 0.5724992156028748,
2228
+ "kl": 0.1376953125,
2229
+ "learning_rate": 9.990103570183738e-06,
2230
+ "loss": 0.0055,
2231
+ "reward": 1.5222105383872986,
2232
+ "reward_std": 0.28245383501052856,
2233
+ "rewards/format_reward_rec": 0.96875,
2234
+ "rewards/iou_reward": 0.5534605383872986,
2235
+ "step": 159
2236
+ },
2237
+ {
2238
+ "clip_ratio": 0.0,
2239
+ "completion_length": 76.09375,
2240
+ "epoch": 0.0019917343026440272,
2241
+ "grad_norm": 0.5227129459381104,
2242
+ "kl": 0.146484375,
2243
+ "learning_rate": 9.99004132848678e-06,
2244
+ "loss": 0.0059,
2245
+ "reward": 1.8685023188591003,
2246
+ "reward_std": 0.0524298120290041,
2247
+ "rewards/format_reward_rec": 1.0,
2248
+ "rewards/iou_reward": 0.8685023784637451,
2249
+ "step": 160
2250
+ },
2251
+ {
2252
+ "clip_ratio": 0.0,
2253
+ "completion_length": 73.875,
2254
+ "epoch": 0.0020041826420355525,
2255
+ "grad_norm": 0.6109131574630737,
2256
+ "kl": 0.1123046875,
2257
+ "learning_rate": 9.989979086789823e-06,
2258
+ "loss": 0.0045,
2259
+ "reward": 1.871379554271698,
2260
+ "reward_std": 0.09691573679447174,
2261
+ "rewards/format_reward_rec": 1.0,
2262
+ "rewards/iou_reward": 0.8713796138763428,
2263
+ "step": 161
2264
+ },
2265
+ {
2266
+ "clip_ratio": 0.0,
2267
+ "completion_length": 75.34375,
2268
+ "epoch": 0.002016630981427078,
2269
+ "grad_norm": 0.5628411769866943,
2270
+ "kl": 0.139892578125,
2271
+ "learning_rate": 9.989916845092865e-06,
2272
+ "loss": 0.0056,
2273
+ "reward": 1.856031894683838,
2274
+ "reward_std": 0.05577436415478587,
2275
+ "rewards/format_reward_rec": 1.0,
2276
+ "rewards/iou_reward": 0.8560318946838379,
2277
+ "step": 162
2278
+ },
2279
+ {
2280
+ "clip_ratio": 0.0,
2281
+ "completion_length": 83.625,
2282
+ "epoch": 0.0020290793208186026,
2283
+ "grad_norm": 0.5170895457267761,
2284
+ "kl": 0.1171875,
2285
+ "learning_rate": 9.989854603395907e-06,
2286
+ "loss": 0.0047,
2287
+ "reward": 1.7793619632720947,
2288
+ "reward_std": 0.1213589683175087,
2289
+ "rewards/format_reward_rec": 1.0,
2290
+ "rewards/iou_reward": 0.7793619930744171,
2291
+ "step": 163
2292
+ },
2293
+ {
2294
+ "clip_ratio": 0.0,
2295
+ "completion_length": 90.5625,
2296
+ "epoch": 0.002041527660210128,
2297
+ "grad_norm": 0.49487820267677307,
2298
+ "kl": 0.111328125,
2299
+ "learning_rate": 9.98979236169895e-06,
2300
+ "loss": 0.0045,
2301
+ "reward": 1.5895143151283264,
2302
+ "reward_std": 0.15016086027026176,
2303
+ "rewards/format_reward_rec": 1.0,
2304
+ "rewards/iou_reward": 0.5895143449306488,
2305
+ "step": 164
2306
+ },
2307
+ {
2308
+ "clip_ratio": 0.0,
2309
+ "completion_length": 78.875,
2310
+ "epoch": 0.002053975999601653,
2311
+ "grad_norm": 0.6349457502365112,
2312
+ "kl": 0.1298828125,
2313
+ "learning_rate": 9.989730120001992e-06,
2314
+ "loss": 0.0052,
2315
+ "reward": 1.5769969820976257,
2316
+ "reward_std": 0.23748912662267685,
2317
+ "rewards/format_reward_rec": 1.0,
2318
+ "rewards/iou_reward": 0.5769969820976257,
2319
+ "step": 165
2320
+ },
2321
+ {
2322
+ "clip_ratio": 0.0,
2323
+ "completion_length": 92.125,
2324
+ "epoch": 0.0020664243389931784,
2325
+ "grad_norm": 0.7995075583457947,
2326
+ "kl": 0.17919921875,
2327
+ "learning_rate": 9.989667878305034e-06,
2328
+ "loss": 0.0072,
2329
+ "reward": 1.7468191981315613,
2330
+ "reward_std": 0.37003619968891144,
2331
+ "rewards/format_reward_rec": 0.96875,
2332
+ "rewards/iou_reward": 0.7780691385269165,
2333
+ "step": 166
2334
+ },
2335
+ {
2336
+ "clip_ratio": 0.0,
2337
+ "completion_length": 80.09375,
2338
+ "epoch": 0.0020788726783847033,
2339
+ "grad_norm": 0.5599526762962341,
2340
+ "kl": 0.1416015625,
2341
+ "learning_rate": 9.989605636608078e-06,
2342
+ "loss": 0.0057,
2343
+ "reward": 1.8237193822860718,
2344
+ "reward_std": 0.10361789353191853,
2345
+ "rewards/format_reward_rec": 1.0,
2346
+ "rewards/iou_reward": 0.8237193524837494,
2347
+ "step": 167
2348
+ },
2349
+ {
2350
+ "clip_ratio": 0.0,
2351
+ "completion_length": 74.15625,
2352
+ "epoch": 0.0020913210177762286,
2353
+ "grad_norm": 0.5949897170066833,
2354
+ "kl": 0.1611328125,
2355
+ "learning_rate": 9.98954339491112e-06,
2356
+ "loss": 0.0065,
2357
+ "reward": 1.7989261150360107,
2358
+ "reward_std": 0.13571097142994404,
2359
+ "rewards/format_reward_rec": 1.0,
2360
+ "rewards/iou_reward": 0.7989261448383331,
2361
+ "step": 168
2362
+ },
2363
+ {
2364
+ "clip_ratio": 0.0,
2365
+ "completion_length": 78.625,
2366
+ "epoch": 0.002103769357167754,
2367
+ "grad_norm": 0.5042181611061096,
2368
+ "kl": 0.10107421875,
2369
+ "learning_rate": 9.989481153214162e-06,
2370
+ "loss": 0.004,
2371
+ "reward": 1.9031283855438232,
2372
+ "reward_std": 0.029928937554359436,
2373
+ "rewards/format_reward_rec": 1.0,
2374
+ "rewards/iou_reward": 0.9031283855438232,
2375
+ "step": 169
2376
+ },
2377
+ {
2378
+ "clip_ratio": 0.0,
2379
+ "completion_length": 80.96875,
2380
+ "epoch": 0.002116217696559279,
2381
+ "grad_norm": 0.49940019845962524,
2382
+ "kl": 0.1201171875,
2383
+ "learning_rate": 9.989418911517205e-06,
2384
+ "loss": 0.0048,
2385
+ "reward": 1.9246177077293396,
2386
+ "reward_std": 0.02288174256682396,
2387
+ "rewards/format_reward_rec": 1.0,
2388
+ "rewards/iou_reward": 0.924617737531662,
2389
+ "step": 170
2390
+ },
2391
+ {
2392
+ "clip_ratio": 0.0,
2393
+ "completion_length": 78.8125,
2394
+ "epoch": 0.0021286660359508044,
2395
+ "grad_norm": 0.6179351806640625,
2396
+ "kl": 0.1689453125,
2397
+ "learning_rate": 9.989356669820247e-06,
2398
+ "loss": 0.0068,
2399
+ "reward": 1.7295928597450256,
2400
+ "reward_std": 0.11133578047156334,
2401
+ "rewards/format_reward_rec": 1.0,
2402
+ "rewards/iou_reward": 0.7295928299427032,
2403
+ "step": 171
2404
+ },
2405
+ {
2406
+ "clip_ratio": 0.0,
2407
+ "completion_length": 78.625,
2408
+ "epoch": 0.0021411143753423292,
2409
+ "grad_norm": 0.5767012238502502,
2410
+ "kl": 0.13427734375,
2411
+ "learning_rate": 9.989294428123289e-06,
2412
+ "loss": 0.0054,
2413
+ "reward": 1.733388900756836,
2414
+ "reward_std": 0.09509897604584694,
2415
+ "rewards/format_reward_rec": 1.0,
2416
+ "rewards/iou_reward": 0.7333889603614807,
2417
+ "step": 172
2418
+ },
2419
+ {
2420
+ "clip_ratio": 0.0,
2421
+ "completion_length": 84.5625,
2422
+ "epoch": 0.0021535627147338545,
2423
+ "grad_norm": 0.6296212077140808,
2424
+ "kl": 0.110107421875,
2425
+ "learning_rate": 9.989232186426333e-06,
2426
+ "loss": 0.0044,
2427
+ "reward": 1.734093189239502,
2428
+ "reward_std": 0.1716211587190628,
2429
+ "rewards/format_reward_rec": 1.0,
2430
+ "rewards/iou_reward": 0.7340930998325348,
2431
+ "step": 173
2432
+ },
2433
+ {
2434
+ "clip_ratio": 0.0,
2435
+ "completion_length": 78.875,
2436
+ "epoch": 0.0021660110541253798,
2437
+ "grad_norm": 0.6160764098167419,
2438
+ "kl": 0.134765625,
2439
+ "learning_rate": 9.989169944729374e-06,
2440
+ "loss": 0.0054,
2441
+ "reward": 1.6963000297546387,
2442
+ "reward_std": 0.155071921646595,
2443
+ "rewards/format_reward_rec": 0.96875,
2444
+ "rewards/iou_reward": 0.7275499999523163,
2445
+ "step": 174
2446
+ },
2447
+ {
2448
+ "clip_ratio": 0.0,
2449
+ "completion_length": 75.625,
2450
+ "epoch": 0.002178459393516905,
2451
+ "grad_norm": 0.7459288239479065,
2452
+ "kl": 0.16015625,
2453
+ "learning_rate": 9.989107703032416e-06,
2454
+ "loss": 0.0064,
2455
+ "reward": 1.748288869857788,
2456
+ "reward_std": 0.06110543105751276,
2457
+ "rewards/format_reward_rec": 1.0,
2458
+ "rewards/iou_reward": 0.7482888698577881,
2459
+ "step": 175
2460
+ },
2461
+ {
2462
+ "clip_ratio": 0.0,
2463
+ "completion_length": 85.4375,
2464
+ "epoch": 0.00219090773290843,
2465
+ "grad_norm": 0.5298272371292114,
2466
+ "kl": 0.11181640625,
2467
+ "learning_rate": 9.989045461335458e-06,
2468
+ "loss": 0.0045,
2469
+ "reward": 1.9230260848999023,
2470
+ "reward_std": 0.017507225275039673,
2471
+ "rewards/format_reward_rec": 1.0,
2472
+ "rewards/iou_reward": 0.92302605509758,
2473
+ "step": 176
2474
+ },
2475
+ {
2476
+ "clip_ratio": 0.0,
2477
+ "completion_length": 89.75,
2478
+ "epoch": 0.002203356072299955,
2479
+ "grad_norm": 0.5735234022140503,
2480
+ "kl": 0.12841796875,
2481
+ "learning_rate": 9.9889832196385e-06,
2482
+ "loss": 0.0051,
2483
+ "reward": 1.8215650916099548,
2484
+ "reward_std": 0.08769259601831436,
2485
+ "rewards/format_reward_rec": 1.0,
2486
+ "rewards/iou_reward": 0.8215650618076324,
2487
+ "step": 177
2488
+ },
2489
+ {
2490
+ "clip_ratio": 0.0,
2491
+ "completion_length": 97.625,
2492
+ "epoch": 0.0022158044116914804,
2493
+ "grad_norm": 0.7775987982749939,
2494
+ "kl": 0.11865234375,
2495
+ "learning_rate": 9.988920977941544e-06,
2496
+ "loss": 0.0047,
2497
+ "reward": 1.8203516602516174,
2498
+ "reward_std": 0.22550363093614578,
2499
+ "rewards/format_reward_rec": 0.96875,
2500
+ "rewards/iou_reward": 0.851601630449295,
2501
+ "step": 178
2502
+ },
2503
+ {
2504
+ "clip_ratio": 0.0,
2505
+ "completion_length": 84.375,
2506
+ "epoch": 0.0022282527510830057,
2507
+ "grad_norm": 0.5245223045349121,
2508
+ "kl": 0.12744140625,
2509
+ "learning_rate": 9.988858736244585e-06,
2510
+ "loss": 0.0051,
2511
+ "reward": 1.8911280632019043,
2512
+ "reward_std": 0.02437590528279543,
2513
+ "rewards/format_reward_rec": 1.0,
2514
+ "rewards/iou_reward": 0.8911280632019043,
2515
+ "step": 179
2516
+ },
2517
+ {
2518
+ "clip_ratio": 0.0,
2519
+ "completion_length": 83.4375,
2520
+ "epoch": 0.0022407010904745305,
2521
+ "grad_norm": 0.518882155418396,
2522
+ "kl": 0.1708984375,
2523
+ "learning_rate": 9.988796494547627e-06,
2524
+ "loss": 0.0069,
2525
+ "reward": 1.7126802206039429,
2526
+ "reward_std": 0.05515829473733902,
2527
+ "rewards/format_reward_rec": 1.0,
2528
+ "rewards/iou_reward": 0.7126802206039429,
2529
+ "step": 180
2530
+ },
2531
+ {
2532
+ "clip_ratio": 0.0,
2533
+ "completion_length": 87.21875,
2534
+ "epoch": 0.002253149429866056,
2535
+ "grad_norm": 0.5248196125030518,
2536
+ "kl": 0.133544921875,
2537
+ "learning_rate": 9.988734252850671e-06,
2538
+ "loss": 0.0053,
2539
+ "reward": 1.7711573839187622,
2540
+ "reward_std": 0.11591519042849541,
2541
+ "rewards/format_reward_rec": 0.96875,
2542
+ "rewards/iou_reward": 0.8024073839187622,
2543
+ "step": 181
2544
+ },
2545
+ {
2546
+ "clip_ratio": 0.0,
2547
+ "completion_length": 83.78125,
2548
+ "epoch": 0.002265597769257581,
2549
+ "grad_norm": 0.5835661292076111,
2550
+ "kl": 0.11669921875,
2551
+ "learning_rate": 9.988672011153713e-06,
2552
+ "loss": 0.0047,
2553
+ "reward": 1.8540168404579163,
2554
+ "reward_std": 0.07147128880023956,
2555
+ "rewards/format_reward_rec": 1.0,
2556
+ "rewards/iou_reward": 0.8540168106555939,
2557
+ "step": 182
2558
+ },
2559
+ {
2560
+ "clip_ratio": 0.0,
2561
+ "completion_length": 81.75,
2562
+ "epoch": 0.0022780461086491064,
2563
+ "grad_norm": 0.6301136016845703,
2564
+ "kl": 0.17919921875,
2565
+ "learning_rate": 9.988609769456755e-06,
2566
+ "loss": 0.0072,
2567
+ "reward": 1.529570460319519,
2568
+ "reward_std": 0.40445713698863983,
2569
+ "rewards/format_reward_rec": 0.96875,
2570
+ "rewards/iou_reward": 0.560820534825325,
2571
+ "step": 183
2572
+ },
2573
+ {
2574
+ "clip_ratio": 0.0,
2575
+ "completion_length": 83.5,
2576
+ "epoch": 0.002290494448040631,
2577
+ "grad_norm": 0.6207419037818909,
2578
+ "kl": 0.14453125,
2579
+ "learning_rate": 9.988547527759798e-06,
2580
+ "loss": 0.0058,
2581
+ "reward": 1.7111443877220154,
2582
+ "reward_std": 0.23457887768745422,
2583
+ "rewards/format_reward_rec": 1.0,
2584
+ "rewards/iou_reward": 0.7111444473266602,
2585
+ "step": 184
2586
+ },
2587
+ {
2588
+ "clip_ratio": 0.0,
2589
+ "completion_length": 86.1875,
2590
+ "epoch": 0.0023029427874321565,
2591
+ "grad_norm": 0.5986655950546265,
2592
+ "kl": 0.14892578125,
2593
+ "learning_rate": 9.98848528606284e-06,
2594
+ "loss": 0.006,
2595
+ "reward": 1.9187002182006836,
2596
+ "reward_std": 0.06093704979866743,
2597
+ "rewards/format_reward_rec": 1.0,
2598
+ "rewards/iou_reward": 0.9187001585960388,
2599
+ "step": 185
2600
+ },
2601
+ {
2602
+ "clip_ratio": 0.0,
2603
+ "completion_length": 87.375,
2604
+ "epoch": 0.0023153911268236817,
2605
+ "grad_norm": 0.6483145952224731,
2606
+ "kl": 0.139892578125,
2607
+ "learning_rate": 9.988423044365882e-06,
2608
+ "loss": 0.0056,
2609
+ "reward": 1.8686339855194092,
2610
+ "reward_std": 0.0916680209338665,
2611
+ "rewards/format_reward_rec": 1.0,
2612
+ "rewards/iou_reward": 0.8686340153217316,
2613
+ "step": 186
2614
+ },
2615
+ {
2616
+ "clip_ratio": 0.0,
2617
+ "completion_length": 82.71875,
2618
+ "epoch": 0.002327839466215207,
2619
+ "grad_norm": 0.6316834688186646,
2620
+ "kl": 0.142578125,
2621
+ "learning_rate": 9.988360802668925e-06,
2622
+ "loss": 0.0057,
2623
+ "reward": 1.5443827509880066,
2624
+ "reward_std": 0.23182823695242405,
2625
+ "rewards/format_reward_rec": 1.0,
2626
+ "rewards/iou_reward": 0.5443826913833618,
2627
+ "step": 187
2628
+ },
2629
+ {
2630
+ "clip_ratio": 0.0,
2631
+ "completion_length": 87.9375,
2632
+ "epoch": 0.002340287805606732,
2633
+ "grad_norm": 0.5704943537712097,
2634
+ "kl": 0.1201171875,
2635
+ "learning_rate": 9.988298560971967e-06,
2636
+ "loss": 0.0048,
2637
+ "reward": 1.7290632724761963,
2638
+ "reward_std": 0.10309558361768723,
2639
+ "rewards/format_reward_rec": 1.0,
2640
+ "rewards/iou_reward": 0.7290633618831635,
2641
+ "step": 188
2642
+ },
2643
+ {
2644
+ "clip_ratio": 0.0,
2645
+ "completion_length": 93.0,
2646
+ "epoch": 0.002352736144998257,
2647
+ "grad_norm": 0.5360406637191772,
2648
+ "kl": 0.12646484375,
2649
+ "learning_rate": 9.98823631927501e-06,
2650
+ "loss": 0.0051,
2651
+ "reward": 1.6119696497917175,
2652
+ "reward_std": 0.27103206515312195,
2653
+ "rewards/format_reward_rec": 0.96875,
2654
+ "rewards/iou_reward": 0.6432196497917175,
2655
+ "step": 189
2656
+ },
2657
+ {
2658
+ "clip_ratio": 0.0,
2659
+ "completion_length": 93.71875,
2660
+ "epoch": 0.0023651844843897824,
2661
+ "grad_norm": 0.5768948197364807,
2662
+ "kl": 0.1376953125,
2663
+ "learning_rate": 9.988174077578053e-06,
2664
+ "loss": 0.0055,
2665
+ "reward": 1.7519559860229492,
2666
+ "reward_std": 0.19511063676327467,
2667
+ "rewards/format_reward_rec": 0.96875,
2668
+ "rewards/iou_reward": 0.783206045627594,
2669
+ "step": 190
2670
+ },
2671
+ {
2672
+ "clip_ratio": 0.0,
2673
+ "completion_length": 87.46875,
2674
+ "epoch": 0.0023776328237813077,
2675
+ "grad_norm": 0.5476570129394531,
2676
+ "kl": 0.1240234375,
2677
+ "learning_rate": 9.988111835881095e-06,
2678
+ "loss": 0.005,
2679
+ "reward": 1.8128422498703003,
2680
+ "reward_std": 0.15785422176122665,
2681
+ "rewards/format_reward_rec": 1.0,
2682
+ "rewards/iou_reward": 0.8128422796726227,
2683
+ "step": 191
2684
+ },
2685
+ {
2686
+ "clip_ratio": 0.0,
2687
+ "completion_length": 84.8125,
2688
+ "epoch": 0.002390081163172833,
2689
+ "grad_norm": 0.6113950610160828,
2690
+ "kl": 0.10986328125,
2691
+ "learning_rate": 9.988049594184137e-06,
2692
+ "loss": 0.0044,
2693
+ "reward": 1.7782840132713318,
2694
+ "reward_std": 0.10723626054823399,
2695
+ "rewards/format_reward_rec": 1.0,
2696
+ "rewards/iou_reward": 0.7782839834690094,
2697
+ "step": 192
2698
+ },
2699
+ {
2700
+ "clip_ratio": 0.0,
2701
+ "completion_length": 103.40625,
2702
+ "epoch": 0.002402529502564358,
2703
+ "grad_norm": 0.5610492825508118,
2704
+ "kl": 0.129150390625,
2705
+ "learning_rate": 9.98798735248718e-06,
2706
+ "loss": 0.0052,
2707
+ "reward": 1.8901705145835876,
2708
+ "reward_std": 0.04007100500166416,
2709
+ "rewards/format_reward_rec": 1.0,
2710
+ "rewards/iou_reward": 0.89017054438591,
2711
+ "step": 193
2712
+ },
2713
+ {
2714
+ "clip_ratio": 0.0,
2715
+ "completion_length": 101.625,
2716
+ "epoch": 0.002414977841955883,
2717
+ "grad_norm": 0.7355772256851196,
2718
+ "kl": 0.113525390625,
2719
+ "learning_rate": 9.98792511079022e-06,
2720
+ "loss": 0.0046,
2721
+ "reward": 1.762471079826355,
2722
+ "reward_std": 0.02271357737481594,
2723
+ "rewards/format_reward_rec": 1.0,
2724
+ "rewards/iou_reward": 0.762471079826355,
2725
+ "step": 194
2726
+ },
2727
+ {
2728
+ "clip_ratio": 0.0,
2729
+ "completion_length": 90.8125,
2730
+ "epoch": 0.0024274261813474083,
2731
+ "grad_norm": 0.6883395314216614,
2732
+ "kl": 0.09521484375,
2733
+ "learning_rate": 9.987862869093264e-06,
2734
+ "loss": 0.0038,
2735
+ "reward": 1.8147326707839966,
2736
+ "reward_std": 0.20457583293318748,
2737
+ "rewards/format_reward_rec": 0.96875,
2738
+ "rewards/iou_reward": 0.8459826409816742,
2739
+ "step": 195
2740
+ },
2741
+ {
2742
+ "clip_ratio": 0.0,
2743
+ "completion_length": 92.875,
2744
+ "epoch": 0.0024398745207389336,
2745
+ "grad_norm": 0.604902446269989,
2746
+ "kl": 0.15380859375,
2747
+ "learning_rate": 9.987800627396306e-06,
2748
+ "loss": 0.0061,
2749
+ "reward": 1.2101359963417053,
2750
+ "reward_std": 0.16694805398583412,
2751
+ "rewards/format_reward_rec": 1.0,
2752
+ "rewards/iou_reward": 0.21013597398996353,
2753
+ "step": 196
2754
+ },
2755
+ {
2756
+ "clip_ratio": 0.0,
2757
+ "completion_length": 100.5625,
2758
+ "epoch": 0.0024523228601304584,
2759
+ "grad_norm": 0.5739537477493286,
2760
+ "kl": 0.1357421875,
2761
+ "learning_rate": 9.987738385699348e-06,
2762
+ "loss": 0.0054,
2763
+ "reward": 1.738525927066803,
2764
+ "reward_std": 0.10700106248259544,
2765
+ "rewards/format_reward_rec": 1.0,
2766
+ "rewards/iou_reward": 0.7385258674621582,
2767
+ "step": 197
2768
+ },
2769
+ {
2770
+ "clip_ratio": 0.0,
2771
+ "completion_length": 85.90625,
2772
+ "epoch": 0.0024647711995219837,
2773
+ "grad_norm": 0.5847993493080139,
2774
+ "kl": 0.1318359375,
2775
+ "learning_rate": 9.987676144002391e-06,
2776
+ "loss": 0.0053,
2777
+ "reward": 1.9518784284591675,
2778
+ "reward_std": 0.01839788258075714,
2779
+ "rewards/format_reward_rec": 1.0,
2780
+ "rewards/iou_reward": 0.9518784284591675,
2781
+ "step": 198
2782
+ },
2783
+ {
2784
+ "clip_ratio": 0.0,
2785
+ "completion_length": 87.5,
2786
+ "epoch": 0.002477219538913509,
2787
+ "grad_norm": 0.6541993021965027,
2788
+ "kl": 0.100341796875,
2789
+ "learning_rate": 9.987613902305433e-06,
2790
+ "loss": 0.004,
2791
+ "reward": 1.848372757434845,
2792
+ "reward_std": 0.1364253256469965,
2793
+ "rewards/format_reward_rec": 1.0,
2794
+ "rewards/iou_reward": 0.8483727872371674,
2795
+ "step": 199
2796
+ },
2797
+ {
2798
+ "clip_ratio": 0.0,
2799
+ "completion_length": 87.0625,
2800
+ "epoch": 0.0024896678783050343,
2801
+ "grad_norm": 0.7986785173416138,
2802
+ "kl": 0.12353515625,
2803
+ "learning_rate": 9.987551660608475e-06,
2804
+ "loss": 0.005,
2805
+ "reward": 1.69829922914505,
2806
+ "reward_std": 0.1897860188037157,
2807
+ "rewards/format_reward_rec": 0.96875,
2808
+ "rewards/iou_reward": 0.7295492589473724,
2809
+ "step": 200
2810
+ }
2811
+ ],
2812
+ "logging_steps": 1.0,
2813
+ "max_steps": 160664,
2814
+ "num_input_tokens_seen": 0,
2815
+ "num_train_epochs": 2,
2816
+ "save_steps": 100,
2817
+ "stateful_callbacks": {
2818
+ "TrainerControl": {
2819
+ "args": {
2820
+ "should_epoch_stop": false,
2821
+ "should_evaluate": false,
2822
+ "should_log": false,
2823
+ "should_save": true,
2824
+ "should_training_stop": false
2825
+ },
2826
+ "attributes": {}
2827
+ }
2828
+ },
2829
+ "total_flos": 0.0,
2830
+ "train_batch_size": 8,
2831
+ "trial_name": null,
2832
+ "trial_params": null
2833
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:551863c0d96a4b9d1205af0398c8578c24624641a94db62faea5bfdbd4a427be
3
+ size 8312
vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
zero_to_fp32.py ADDED
@@ -0,0 +1,674 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+
3
+ # Copyright (c) Microsoft Corporation.
4
+ # SPDX-License-Identifier: Apache-2.0
5
+
6
+ # DeepSpeed Team
7
+
8
+ # This script extracts fp32 consolidated weights from a zero 1, 2 and 3 DeepSpeed checkpoints. It gets
9
+ # copied into the top level checkpoint dir, so the user can easily do the conversion at any point in
10
+ # the future. Once extracted, the weights don't require DeepSpeed and can be used in any
11
+ # application.
12
+ #
13
+ # example:
14
+ # python zero_to_fp32.py . output_dir/
15
+ # or
16
+ # python zero_to_fp32.py . output_dir/ --safe_serialization
17
+
18
+ import argparse
19
+ import torch
20
+ import glob
21
+ import math
22
+ import os
23
+ import re
24
+ import json
25
+ from tqdm import tqdm
26
+ from collections import OrderedDict
27
+ from dataclasses import dataclass
28
+
29
+ # while this script doesn't use deepspeed to recover data, since the checkpoints are pickled with
30
+ # DeepSpeed data structures it has to be available in the current python environment.
31
+ from deepspeed.utils import logger
32
+ from deepspeed.checkpoint.constants import (DS_VERSION, OPTIMIZER_STATE_DICT, SINGLE_PARTITION_OF_FP32_GROUPS,
33
+ FP32_FLAT_GROUPS, ZERO_STAGE, PARTITION_COUNT, PARAM_SHAPES, BUFFER_NAMES,
34
+ FROZEN_PARAM_SHAPES, FROZEN_PARAM_FRAGMENTS)
35
+
36
+
37
+ @dataclass
38
+ class zero_model_state:
39
+ buffers: dict()
40
+ param_shapes: dict()
41
+ shared_params: list
42
+ ds_version: int
43
+ frozen_param_shapes: dict()
44
+ frozen_param_fragments: dict()
45
+
46
+
47
+ debug = 0
48
+
49
+ # load to cpu
50
+ device = torch.device('cpu')
51
+
52
+
53
+ def atoi(text):
54
+ return int(text) if text.isdigit() else text
55
+
56
+
57
+ def natural_keys(text):
58
+ '''
59
+ alist.sort(key=natural_keys) sorts in human order
60
+ http://nedbatchelder.com/blog/200712/human_sorting.html
61
+ (See Toothy's implementation in the comments)
62
+ '''
63
+ return [atoi(c) for c in re.split(r'(\d+)', text)]
64
+
65
+
66
+ def get_model_state_file(checkpoint_dir, zero_stage):
67
+ if not os.path.isdir(checkpoint_dir):
68
+ raise FileNotFoundError(f"Directory '{checkpoint_dir}' doesn't exist")
69
+
70
+ # there should be only one file
71
+ if zero_stage <= 2:
72
+ file = os.path.join(checkpoint_dir, "mp_rank_00_model_states.pt")
73
+ elif zero_stage == 3:
74
+ file = os.path.join(checkpoint_dir, "zero_pp_rank_0_mp_rank_00_model_states.pt")
75
+
76
+ if not os.path.exists(file):
77
+ raise FileNotFoundError(f"can't find model states file at '{file}'")
78
+
79
+ return file
80
+
81
+
82
+ def get_checkpoint_files(checkpoint_dir, glob_pattern):
83
+ # XXX: need to test that this simple glob rule works for multi-node setup too
84
+ ckpt_files = sorted(glob.glob(os.path.join(checkpoint_dir, glob_pattern)), key=natural_keys)
85
+
86
+ if len(ckpt_files) == 0:
87
+ raise FileNotFoundError(f"can't find {glob_pattern} files in directory '{checkpoint_dir}'")
88
+
89
+ return ckpt_files
90
+
91
+
92
+ def get_optim_files(checkpoint_dir):
93
+ return get_checkpoint_files(checkpoint_dir, "*_optim_states.pt")
94
+
95
+
96
+ def get_model_state_files(checkpoint_dir):
97
+ return get_checkpoint_files(checkpoint_dir, "*_model_states.pt")
98
+
99
+
100
+ def parse_model_states(files):
101
+ zero_model_states = []
102
+ for file in files:
103
+ state_dict = torch.load(file, map_location=device)
104
+
105
+ if BUFFER_NAMES not in state_dict:
106
+ raise ValueError(f"{file} is not a model state checkpoint")
107
+ buffer_names = state_dict[BUFFER_NAMES]
108
+ if debug:
109
+ print("Found buffers:", buffer_names)
110
+
111
+ # recover just the buffers while restoring them to fp32 if they were saved in fp16
112
+ buffers = {k: v.float() for k, v in state_dict["module"].items() if k in buffer_names}
113
+ param_shapes = state_dict[PARAM_SHAPES]
114
+
115
+ # collect parameters that are included in param_shapes
116
+ param_names = []
117
+ for s in param_shapes:
118
+ for name in s.keys():
119
+ param_names.append(name)
120
+
121
+ # update with frozen parameters
122
+ frozen_param_shapes = state_dict.get(FROZEN_PARAM_SHAPES, None)
123
+ if frozen_param_shapes is not None:
124
+ if debug:
125
+ print(f"Found frozen_param_shapes: {frozen_param_shapes}")
126
+ param_names += list(frozen_param_shapes.keys())
127
+
128
+ # handle shared params
129
+ shared_params = [[k, v] for k, v in state_dict["shared_params"].items()]
130
+
131
+ ds_version = state_dict.get(DS_VERSION, None)
132
+
133
+ frozen_param_fragments = state_dict.get(FROZEN_PARAM_FRAGMENTS, None)
134
+
135
+ z_model_state = zero_model_state(buffers=buffers,
136
+ param_shapes=param_shapes,
137
+ shared_params=shared_params,
138
+ ds_version=ds_version,
139
+ frozen_param_shapes=frozen_param_shapes,
140
+ frozen_param_fragments=frozen_param_fragments)
141
+ zero_model_states.append(z_model_state)
142
+
143
+ return zero_model_states
144
+
145
+
146
+ def parse_optim_states(files, ds_checkpoint_dir):
147
+ total_files = len(files)
148
+ state_dicts = []
149
+ for f in files:
150
+ state_dict = torch.load(f, map_location=device)
151
+ # immediately discard the potentially huge 2 optimizer states as we only care for fp32 master weights
152
+ # and also handle the case where it was already removed by another helper script
153
+ state_dict["optimizer_state_dict"].pop("optimizer_state_dict", None)
154
+ state_dicts.append(state_dict)
155
+
156
+ if not ZERO_STAGE in state_dicts[0][OPTIMIZER_STATE_DICT]:
157
+ raise ValueError(f"{files[0]} is not a zero checkpoint")
158
+ zero_stage = state_dicts[0][OPTIMIZER_STATE_DICT][ZERO_STAGE]
159
+ world_size = state_dicts[0][OPTIMIZER_STATE_DICT][PARTITION_COUNT]
160
+
161
+ # For ZeRO-2 each param group can have different partition_count as data parallelism for expert
162
+ # parameters can be different from data parallelism for non-expert parameters. So we can just
163
+ # use the max of the partition_count to get the dp world_size.
164
+
165
+ if type(world_size) is list:
166
+ world_size = max(world_size)
167
+
168
+ if world_size != total_files:
169
+ raise ValueError(
170
+ f"Expected {world_size} of '*_optim_states.pt' under '{ds_checkpoint_dir}' but found {total_files} files. "
171
+ "Possibly due to an overwrite of an old checkpoint, or a checkpoint didn't get saved by one or more processes."
172
+ )
173
+
174
+ # the groups are named differently in each stage
175
+ if zero_stage <= 2:
176
+ fp32_groups_key = SINGLE_PARTITION_OF_FP32_GROUPS
177
+ elif zero_stage == 3:
178
+ fp32_groups_key = FP32_FLAT_GROUPS
179
+ else:
180
+ raise ValueError(f"unknown zero stage {zero_stage}")
181
+
182
+ if zero_stage <= 2:
183
+ fp32_flat_groups = [state_dicts[i][OPTIMIZER_STATE_DICT][fp32_groups_key] for i in range(len(state_dicts))]
184
+ elif zero_stage == 3:
185
+ # if there is more than one param group, there will be multiple flattened tensors - one
186
+ # flattened tensor per group - for simplicity merge them into a single tensor
187
+ #
188
+ # XXX: could make the script more memory efficient for when there are multiple groups - it
189
+ # will require matching the sub-lists of param_shapes for each param group flattened tensor
190
+
191
+ fp32_flat_groups = [
192
+ torch.cat(state_dicts[i][OPTIMIZER_STATE_DICT][fp32_groups_key], 0) for i in range(len(state_dicts))
193
+ ]
194
+
195
+ return zero_stage, world_size, fp32_flat_groups
196
+
197
+
198
+ def _get_fp32_state_dict_from_zero_checkpoint(ds_checkpoint_dir, exclude_frozen_parameters):
199
+ """
200
+ Returns fp32 state_dict reconstructed from ds checkpoint
201
+
202
+ Args:
203
+ - ``ds_checkpoint_dir``: path to the deepspeed checkpoint folder (where the optimizer files are)
204
+
205
+ """
206
+ print(f"Processing zero checkpoint '{ds_checkpoint_dir}'")
207
+
208
+ optim_files = get_optim_files(ds_checkpoint_dir)
209
+ zero_stage, world_size, fp32_flat_groups = parse_optim_states(optim_files, ds_checkpoint_dir)
210
+ print(f"Detected checkpoint of type zero stage {zero_stage}, world_size: {world_size}")
211
+
212
+ model_files = get_model_state_files(ds_checkpoint_dir)
213
+
214
+ zero_model_states = parse_model_states(model_files)
215
+ print(f'Parsing checkpoint created by deepspeed=={zero_model_states[0].ds_version}')
216
+
217
+ if zero_stage <= 2:
218
+ return _get_fp32_state_dict_from_zero2_checkpoint(world_size, fp32_flat_groups, zero_model_states,
219
+ exclude_frozen_parameters)
220
+ elif zero_stage == 3:
221
+ return _get_fp32_state_dict_from_zero3_checkpoint(world_size, fp32_flat_groups, zero_model_states,
222
+ exclude_frozen_parameters)
223
+
224
+
225
+ def _zero2_merge_frozen_params(state_dict, zero_model_states):
226
+ if zero_model_states[0].frozen_param_shapes is None or len(zero_model_states[0].frozen_param_shapes) == 0:
227
+ return
228
+
229
+ frozen_param_shapes = zero_model_states[0].frozen_param_shapes
230
+ frozen_param_fragments = zero_model_states[0].frozen_param_fragments
231
+
232
+ if debug:
233
+ num_elem = sum(s.numel() for s in frozen_param_shapes.values())
234
+ print(f'rank 0: {FROZEN_PARAM_SHAPES}.numel = {num_elem}')
235
+
236
+ wanted_params = len(frozen_param_shapes)
237
+ wanted_numel = sum(s.numel() for s in frozen_param_shapes.values())
238
+ avail_numel = sum([p.numel() for p in frozen_param_fragments.values()])
239
+ print(f'Frozen params: Have {avail_numel} numels to process.')
240
+ print(f'Frozen params: Need {wanted_numel} numels in {wanted_params} params')
241
+
242
+ total_params = 0
243
+ total_numel = 0
244
+ for name, shape in frozen_param_shapes.items():
245
+ total_params += 1
246
+ unpartitioned_numel = shape.numel()
247
+ total_numel += unpartitioned_numel
248
+
249
+ state_dict[name] = frozen_param_fragments[name]
250
+
251
+ if debug:
252
+ print(f"{name} full shape: {shape} unpartitioned numel {unpartitioned_numel} ")
253
+
254
+ print(f"Reconstructed Frozen fp32 state dict with {total_params} params {total_numel} elements")
255
+
256
+
257
+ def _has_callable(obj, fn):
258
+ attr = getattr(obj, fn, None)
259
+ return callable(attr)
260
+
261
+
262
+ def _zero2_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states):
263
+ param_shapes = zero_model_states[0].param_shapes
264
+
265
+ # Reconstruction protocol:
266
+ #
267
+ # XXX: document this
268
+
269
+ if debug:
270
+ for i in range(world_size):
271
+ for j in range(len(fp32_flat_groups[0])):
272
+ print(f"{FP32_FLAT_GROUPS}[{i}][{j}].shape={fp32_flat_groups[i][j].shape}")
273
+
274
+ # XXX: memory usage doubles here (zero2)
275
+ num_param_groups = len(fp32_flat_groups[0])
276
+ merged_single_partition_of_fp32_groups = []
277
+ for i in range(num_param_groups):
278
+ merged_partitions = [sd[i] for sd in fp32_flat_groups]
279
+ full_single_fp32_vector = torch.cat(merged_partitions, 0)
280
+ merged_single_partition_of_fp32_groups.append(full_single_fp32_vector)
281
+ avail_numel = sum(
282
+ [full_single_fp32_vector.numel() for full_single_fp32_vector in merged_single_partition_of_fp32_groups])
283
+
284
+ if debug:
285
+ wanted_params = sum([len(shapes) for shapes in param_shapes])
286
+ wanted_numel = sum([sum(shape.numel() for shape in shapes.values()) for shapes in param_shapes])
287
+ # not asserting if there is a mismatch due to possible padding
288
+ print(f"Have {avail_numel} numels to process.")
289
+ print(f"Need {wanted_numel} numels in {wanted_params} params.")
290
+
291
+ # params
292
+ # XXX: for huge models that can't fit into the host's RAM we will have to recode this to support
293
+ # out-of-core computing solution
294
+ total_numel = 0
295
+ total_params = 0
296
+ for shapes, full_single_fp32_vector in zip(param_shapes, merged_single_partition_of_fp32_groups):
297
+ offset = 0
298
+ avail_numel = full_single_fp32_vector.numel()
299
+ for name, shape in shapes.items():
300
+
301
+ unpartitioned_numel = shape.numel() if _has_callable(shape, 'numel') else math.prod(shape)
302
+ total_numel += unpartitioned_numel
303
+ total_params += 1
304
+
305
+ if debug:
306
+ print(f"{name} full shape: {shape} unpartitioned numel {unpartitioned_numel} ")
307
+ state_dict[name] = full_single_fp32_vector.narrow(0, offset, unpartitioned_numel).view(shape)
308
+ offset += unpartitioned_numel
309
+
310
+ # Z2 started to align to 2*world_size to improve nccl performance. Therefore both offset and
311
+ # avail_numel can differ by anywhere between 0..2*world_size. Due to two unrelated complex
312
+ # paddings performed in the code it's almost impossible to predict the exact numbers w/o the
313
+ # live optimizer object, so we are checking that the numbers are within the right range
314
+ align_to = 2 * world_size
315
+
316
+ def zero2_align(x):
317
+ return align_to * math.ceil(x / align_to)
318
+
319
+ if debug:
320
+ print(f"original offset={offset}, avail_numel={avail_numel}")
321
+
322
+ offset = zero2_align(offset)
323
+ avail_numel = zero2_align(avail_numel)
324
+
325
+ if debug:
326
+ print(f"aligned offset={offset}, avail_numel={avail_numel}")
327
+
328
+ # Sanity check
329
+ if offset != avail_numel:
330
+ raise ValueError(f"consumed {offset} numels out of {avail_numel} - something is wrong")
331
+
332
+ print(f"Reconstructed fp32 state dict with {total_params} params {total_numel} elements")
333
+
334
+
335
+ def _get_fp32_state_dict_from_zero2_checkpoint(world_size, fp32_flat_groups, zero_model_states,
336
+ exclude_frozen_parameters):
337
+ state_dict = OrderedDict()
338
+
339
+ # buffers
340
+ buffers = zero_model_states[0].buffers
341
+ state_dict.update(buffers)
342
+ if debug:
343
+ print(f"added {len(buffers)} buffers")
344
+
345
+ if not exclude_frozen_parameters:
346
+ _zero2_merge_frozen_params(state_dict, zero_model_states)
347
+
348
+ _zero2_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states)
349
+
350
+ # recover shared parameters
351
+ for pair in zero_model_states[0].shared_params:
352
+ if pair[1] in state_dict:
353
+ state_dict[pair[0]] = state_dict[pair[1]]
354
+
355
+ return state_dict
356
+
357
+
358
+ def zero3_partitioned_param_info(unpartitioned_numel, world_size):
359
+ remainder = unpartitioned_numel % world_size
360
+ padding_numel = (world_size - remainder) if remainder else 0
361
+ partitioned_numel = math.ceil(unpartitioned_numel / world_size)
362
+ return partitioned_numel, padding_numel
363
+
364
+
365
+ def _zero3_merge_frozen_params(state_dict, world_size, zero_model_states):
366
+ if zero_model_states[0].frozen_param_shapes is None or len(zero_model_states[0].frozen_param_shapes) == 0:
367
+ return
368
+
369
+ if debug:
370
+ for i in range(world_size):
371
+ num_elem = sum(s.numel() for s in zero_model_states[i].frozen_param_fragments.values())
372
+ print(f'rank {i}: {FROZEN_PARAM_SHAPES}.numel = {num_elem}')
373
+
374
+ frozen_param_shapes = zero_model_states[0].frozen_param_shapes
375
+ wanted_params = len(frozen_param_shapes)
376
+ wanted_numel = sum(s.numel() for s in frozen_param_shapes.values())
377
+ avail_numel = sum([p.numel() for p in zero_model_states[0].frozen_param_fragments.values()]) * world_size
378
+ print(f'Frozen params: Have {avail_numel} numels to process.')
379
+ print(f'Frozen params: Need {wanted_numel} numels in {wanted_params} params')
380
+
381
+ total_params = 0
382
+ total_numel = 0
383
+ for name, shape in zero_model_states[0].frozen_param_shapes.items():
384
+ total_params += 1
385
+ unpartitioned_numel = shape.numel()
386
+ total_numel += unpartitioned_numel
387
+
388
+ param_frags = tuple(model_state.frozen_param_fragments[name] for model_state in zero_model_states)
389
+ state_dict[name] = torch.cat(param_frags, 0).narrow(0, 0, unpartitioned_numel).view(shape)
390
+
391
+ partitioned_numel, partitioned_padding_numel = zero3_partitioned_param_info(unpartitioned_numel, world_size)
392
+
393
+ if debug:
394
+ print(
395
+ f"Frozen params: {total_params} {name} full shape: {shape} partition0 numel={partitioned_numel} partitioned_padding_numel={partitioned_padding_numel}"
396
+ )
397
+
398
+ print(f"Reconstructed Frozen fp32 state dict with {total_params} params {total_numel} elements")
399
+
400
+
401
+ def _zero3_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states):
402
+ param_shapes = zero_model_states[0].param_shapes
403
+ avail_numel = fp32_flat_groups[0].numel() * world_size
404
+ # Reconstruction protocol: For zero3 we need to zip the partitions together at boundary of each
405
+ # param, re-consolidating each param, while dealing with padding if any
406
+
407
+ # merge list of dicts, preserving order
408
+ param_shapes = {k: v for d in param_shapes for k, v in d.items()}
409
+
410
+ if debug:
411
+ for i in range(world_size):
412
+ print(f"{FP32_FLAT_GROUPS}[{i}].shape={fp32_flat_groups[i].shape}")
413
+
414
+ wanted_params = len(param_shapes)
415
+ wanted_numel = sum(shape.numel() for shape in param_shapes.values())
416
+ # not asserting if there is a mismatch due to possible padding
417
+ avail_numel = fp32_flat_groups[0].numel() * world_size
418
+ print(f"Trainable params: Have {avail_numel} numels to process.")
419
+ print(f"Trainable params: Need {wanted_numel} numels in {wanted_params} params.")
420
+
421
+ # params
422
+ # XXX: for huge models that can't fit into the host's RAM we will have to recode this to support
423
+ # out-of-core computing solution
424
+ offset = 0
425
+ total_numel = 0
426
+ total_params = 0
427
+ for name, shape in tqdm(param_shapes.items(), desc='Gathering Sharded Weights'):
428
+ unpartitioned_numel = shape.numel()
429
+ total_numel += unpartitioned_numel
430
+ total_params += 1
431
+ partitioned_numel, partitioned_padding_numel = zero3_partitioned_param_info(unpartitioned_numel, world_size)
432
+
433
+ if debug:
434
+ print(
435
+ f"Trainable params: {total_params} {name} full shape: {shape} partition0 numel={partitioned_numel} partitioned_padding_numel={partitioned_padding_numel}"
436
+ )
437
+
438
+ # XXX: memory usage doubles here
439
+ state_dict[name] = torch.cat(
440
+ tuple(fp32_flat_groups[i].narrow(0, offset, partitioned_numel) for i in range(world_size)),
441
+ 0).narrow(0, 0, unpartitioned_numel).view(shape)
442
+ offset += partitioned_numel
443
+
444
+ offset *= world_size
445
+
446
+ # Sanity check
447
+ if offset != avail_numel:
448
+ raise ValueError(f"consumed {offset} numels out of {avail_numel} - something is wrong")
449
+
450
+ print(f"Reconstructed Trainable fp32 state dict with {total_params} params {total_numel} elements")
451
+
452
+
453
+ def _get_fp32_state_dict_from_zero3_checkpoint(world_size, fp32_flat_groups, zero_model_states,
454
+ exclude_frozen_parameters):
455
+ state_dict = OrderedDict()
456
+
457
+ # buffers
458
+ buffers = zero_model_states[0].buffers
459
+ state_dict.update(buffers)
460
+ if debug:
461
+ print(f"added {len(buffers)} buffers")
462
+
463
+ if not exclude_frozen_parameters:
464
+ _zero3_merge_frozen_params(state_dict, world_size, zero_model_states)
465
+
466
+ _zero3_merge_trainable_params(state_dict, world_size, fp32_flat_groups, zero_model_states)
467
+
468
+ # recover shared parameters
469
+ for pair in zero_model_states[0].shared_params:
470
+ if pair[1] in state_dict:
471
+ state_dict[pair[0]] = state_dict[pair[1]]
472
+
473
+ return state_dict
474
+
475
+
476
+ def get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir, tag=None, exclude_frozen_parameters=False):
477
+ """
478
+ Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated state_dict that can be loaded with
479
+ ``load_state_dict()`` and used for training without DeepSpeed or shared with others, for example
480
+ via a model hub.
481
+
482
+ Args:
483
+ - ``checkpoint_dir``: path to the desired checkpoint folder
484
+ - ``tag``: checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt to load tag in 'latest' file. e.g., ``global_step14``
485
+ - ``exclude_frozen_parameters``: exclude frozen parameters
486
+
487
+ Returns:
488
+ - pytorch ``state_dict``
489
+
490
+ Note: this approach may not work if your application doesn't have sufficient free CPU memory and
491
+ you may need to use the offline approach using the ``zero_to_fp32.py`` script that is saved with
492
+ the checkpoint.
493
+
494
+ A typical usage might be ::
495
+
496
+ from deepspeed.utils.zero_to_fp32 import get_fp32_state_dict_from_zero_checkpoint
497
+ # do the training and checkpoint saving
498
+ state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir) # already on cpu
499
+ model = model.cpu() # move to cpu
500
+ model.load_state_dict(state_dict)
501
+ # submit to model hub or save the model to share with others
502
+
503
+ In this example the ``model`` will no longer be usable in the deepspeed context of the same
504
+ application. i.e. you will need to re-initialize the deepspeed engine, since
505
+ ``model.load_state_dict(state_dict)`` will remove all the deepspeed magic from it.
506
+
507
+ If you want it all done for you, use ``load_state_dict_from_zero_checkpoint`` instead.
508
+
509
+ """
510
+ if tag is None:
511
+ latest_path = os.path.join(checkpoint_dir, 'latest')
512
+ if os.path.isfile(latest_path):
513
+ with open(latest_path, 'r') as fd:
514
+ tag = fd.read().strip()
515
+ else:
516
+ raise ValueError(f"Unable to find 'latest' file at {latest_path}")
517
+
518
+ ds_checkpoint_dir = os.path.join(checkpoint_dir, tag)
519
+
520
+ if not os.path.isdir(ds_checkpoint_dir):
521
+ raise FileNotFoundError(f"Directory '{ds_checkpoint_dir}' doesn't exist")
522
+
523
+ return _get_fp32_state_dict_from_zero_checkpoint(ds_checkpoint_dir, exclude_frozen_parameters)
524
+
525
+
526
+ def convert_zero_checkpoint_to_fp32_state_dict(checkpoint_dir,
527
+ output_dir,
528
+ max_shard_size="5GB",
529
+ safe_serialization=False,
530
+ tag=None,
531
+ exclude_frozen_parameters=False):
532
+ """
533
+ Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated ``state_dict`` file that can be
534
+ loaded with ``torch.load(file)`` + ``load_state_dict()`` and used for training without DeepSpeed.
535
+
536
+ Args:
537
+ - ``checkpoint_dir``: path to the desired checkpoint folder. (one that contains the tag-folder, like ``global_step14``)
538
+ - ``output_dir``: directory to the pytorch fp32 state_dict output files
539
+ - ``max_shard_size``: the maximum size for a checkpoint before being sharded, default value is 5GB
540
+ - ``safe_serialization``: whether to save the model using `safetensors` or the traditional PyTorch way (that uses `pickle`).
541
+ - ``tag``: checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt to load tag in the file named ``latest`` in the checkpoint folder, e.g., ``global_step14``
542
+ - ``exclude_frozen_parameters``: exclude frozen parameters
543
+ """
544
+ # Dependency pre-check
545
+ if safe_serialization:
546
+ try:
547
+ from safetensors.torch import save_file
548
+ except ImportError:
549
+ print('If you want to use `safe_serialization`, please `pip install safetensors`')
550
+ raise
551
+ if max_shard_size is not None:
552
+ try:
553
+ from huggingface_hub import split_torch_state_dict_into_shards
554
+ except ImportError:
555
+ print('If you want to use `max_shard_size`, please `pip install huggingface_hub`')
556
+ raise
557
+
558
+ # Convert zero checkpoint to state_dict
559
+ state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir, tag, exclude_frozen_parameters)
560
+
561
+ # Shard the model if it is too big.
562
+ weights_name = "model.safetensors" if safe_serialization else "pytorch_model.bin"
563
+ if max_shard_size is not None:
564
+ filename_pattern = weights_name.replace(".bin", "{suffix}.bin").replace(".safetensors", "{suffix}.safetensors")
565
+ state_dict_split = split_torch_state_dict_into_shards(state_dict,
566
+ filename_pattern=filename_pattern,
567
+ max_shard_size=max_shard_size)
568
+ else:
569
+ from collections import namedtuple
570
+ StateDictSplit = namedtuple("StateDictSplit", ["is_sharded", "filename_to_tensors"])
571
+ state_dict_split = StateDictSplit(is_sharded=False,
572
+ filename_to_tensors={weights_name: list(state_dict.keys())})
573
+
574
+ # Save the model
575
+ filename_to_tensors = state_dict_split.filename_to_tensors.items()
576
+ for shard_file, tensors in tqdm(filename_to_tensors, desc="Saving checkpoint shards"):
577
+ shard = {tensor: state_dict[tensor].contiguous() for tensor in tensors}
578
+ output_path = os.path.join(output_dir, shard_file)
579
+ if safe_serialization:
580
+ save_file(shard, output_path, metadata={"format": "pt"})
581
+ else:
582
+ torch.save(shard, output_path)
583
+
584
+ # Save index if sharded
585
+ if state_dict_split.is_sharded:
586
+ index = {
587
+ "metadata": state_dict_split.metadata,
588
+ "weight_map": state_dict_split.tensor_to_filename,
589
+ }
590
+ save_index_file = "model.safetensors.index.json" if safe_serialization else "pytorch_model.bin.index.json"
591
+ save_index_file = os.path.join(output_dir, save_index_file)
592
+ with open(save_index_file, "w", encoding="utf-8") as f:
593
+ content = json.dumps(index, indent=2, sort_keys=True) + "\n"
594
+ f.write(content)
595
+
596
+
597
+ def load_state_dict_from_zero_checkpoint(model, checkpoint_dir, tag=None):
598
+ """
599
+ 1. Put the provided model to cpu
600
+ 2. Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated ``state_dict``
601
+ 3. Load it into the provided model
602
+
603
+ Args:
604
+ - ``model``: the model object to update
605
+ - ``checkpoint_dir``: path to the desired checkpoint folder. (one that contains the tag-folder, like ``global_step14``)
606
+ - ``tag``: checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt to load tag in the file named ``latest`` in the checkpoint folder, e.g., ``global_step14``
607
+
608
+ Returns:
609
+ - ``model`: modified model
610
+
611
+ Make sure you have plenty of CPU memory available before you call this function. If you don't
612
+ have enough use the ``zero_to_fp32.py`` utility to do the conversion. You will find it
613
+ conveniently placed for you in the checkpoint folder.
614
+
615
+ A typical usage might be ::
616
+
617
+ from deepspeed.utils.zero_to_fp32 import load_state_dict_from_zero_checkpoint
618
+ model = load_state_dict_from_zero_checkpoint(trainer.model, checkpoint_dir)
619
+ # submit to model hub or save the model to share with others
620
+
621
+ Note, that once this was run, the ``model`` will no longer be usable in the deepspeed context
622
+ of the same application. i.e. you will need to re-initialize the deepspeed engine, since
623
+ ``model.load_state_dict(state_dict)`` will remove all the deepspeed magic from it.
624
+
625
+ """
626
+ logger.info(f"Extracting fp32 weights")
627
+ state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir, tag)
628
+
629
+ logger.info(f"Overwriting model with fp32 weights")
630
+ model = model.cpu()
631
+ model.load_state_dict(state_dict, strict=False)
632
+
633
+ return model
634
+
635
+
636
+ if __name__ == "__main__":
637
+ parser = argparse.ArgumentParser()
638
+ parser.add_argument("checkpoint_dir",
639
+ type=str,
640
+ help="path to the desired checkpoint folder, e.g., path/checkpoint-12")
641
+ parser.add_argument("output_dir",
642
+ type=str,
643
+ help="directory to the pytorch fp32 state_dict output files"
644
+ "(e.g. path/checkpoint-12-output/)")
645
+ parser.add_argument(
646
+ "--max_shard_size",
647
+ type=str,
648
+ default="5GB",
649
+ help="The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size"
650
+ "lower than this size. If expressed as a string, needs to be digits followed by a unit (like `5MB`"
651
+ "We default it to 5GB in order for models to be able to run easily on free-tier google colab instances"
652
+ "without CPU OOM issues.")
653
+ parser.add_argument(
654
+ "--safe_serialization",
655
+ default=False,
656
+ action='store_true',
657
+ help="Whether to save the model using `safetensors` or the traditional PyTorch way (that uses `pickle`).")
658
+ parser.add_argument("-t",
659
+ "--tag",
660
+ type=str,
661
+ default=None,
662
+ help="checkpoint tag used as a unique identifier for checkpoint. e.g., global_step1")
663
+ parser.add_argument("--exclude_frozen_parameters", action='store_true', help="exclude frozen parameters")
664
+ parser.add_argument("-d", "--debug", action='store_true', help="enable debug")
665
+ args = parser.parse_args()
666
+
667
+ debug = args.debug
668
+
669
+ convert_zero_checkpoint_to_fp32_state_dict(args.checkpoint_dir,
670
+ args.output_dir,
671
+ max_shard_size=args.max_shard_size,
672
+ safe_serialization=args.safe_serialization,
673
+ tag=args.tag,
674
+ exclude_frozen_parameters=args.exclude_frozen_parameters)