moyixiao commited on 19 days ago

Commit

5bf22ef

verified ·

1 Parent(s): 0bd8f78

Training in progress, step 2280, checkpoint

Browse files

Files changed (17) hide show

.gitattributes +1 -0
checkpoint-2280/README.md +202 -0
checkpoint-2280/adapter_config.json +37 -0
checkpoint-2280/adapter_model.safetensors +3 -0
checkpoint-2280/added_tokens.json +28 -0
checkpoint-2280/merges.txt +0 -0
checkpoint-2280/optimizer.pt +3 -0
checkpoint-2280/rng_state_0.pth +3 -0
checkpoint-2280/rng_state_1.pth +3 -0
checkpoint-2280/scaler.pt +3 -0
checkpoint-2280/scheduler.pt +3 -0
checkpoint-2280/special_tokens_map.json +31 -0
checkpoint-2280/tokenizer.json +3 -0
checkpoint-2280/tokenizer_config.json +241 -0
checkpoint-2280/trainer_state.json +1630 -0
checkpoint-2280/training_args.bin +3 -0
checkpoint-2280/vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -71,3 +71,4 @@ checkpoint-2040/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 checkpoint-2100/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 checkpoint-2160/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 checkpoint-2220/tokenizer.json filter=lfs diff=lfs merge=lfs -text

 checkpoint-2100/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 checkpoint-2160/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 checkpoint-2220/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-2280/tokenizer.json filter=lfs diff=lfs merge=lfs -text

checkpoint-2280/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen3-0.6B-Base
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.14.0

checkpoint-2280/adapter_config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen3-0.6B-Base",
+  "bias": "none",
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "v_proj",
+    "o_proj",
+    "gate_proj",
+    "up_proj",
+    "q_proj",
+    "down_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-2280/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b35176ffaee6fd113efb8ee8fa20aeadcb29307ea6dbea8cea9862d6f41aa2ee
+size 80792456

checkpoint-2280/added_tokens.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "</think>": 151668,
+  "</tool_call>": 151658,
+  "</tool_response>": 151666,
+  "<think>": 151667,
+  "<tool_call>": 151657,
+  "<tool_response>": 151665,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

checkpoint-2280/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-2280/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e6224292300d56428c49e86fb02862adba759f60328dbac993ca12cfbed0cbd
+size 161815978

checkpoint-2280/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:34bcae41c589c7e4cab7b2ef263b878c90c2741404a6af11994dc31537b2319b
+size 14512

checkpoint-2280/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d05dc84075e8f7dd1191c36f3be9dda12073208e12f7d2cef433c38d6336774a
+size 14512

checkpoint-2280/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6db25aa5b3ece5d09f8ed6812128ecb610207f2d9d04c00c249e6b1d40c51b60
+size 988

checkpoint-2280/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a58edb243943fb6c038d425414b65c495368db468d356039b61f63a4b3e88b61
+size 1064

checkpoint-2280/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-2280/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
+size 11422654

checkpoint-2280/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,241 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151665": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151666": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151667": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151668": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% endif %}{% if system_message is defined %}{{ 'System: ' + system_message + '<|endoftext|>' + '\n' }}{% endif %}{% for message in loop_messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ 'Human: ' + content + '<|endoftext|>' + '\nAssistant:' }}{% elif message['role'] == 'assistant' %}{{ content + '<|endoftext|>' + '\n' }}{% endif %}{% endfor %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-2280/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1630 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 2.681951793062904,
+  "eval_steps": 500,
+  "global_step": 2280,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.011757789535567314,
+      "grad_norm": 2.1373069286346436,
+      "learning_rate": 0.00010384615384615383,
+      "loss": 3.253,
+      "step": 10
+    },
+    {
+      "epoch": 0.023515579071134628,
+      "grad_norm": 1.1001005172729492,
+      "learning_rate": 0.0002192307692307692,
+      "loss": 2.2285,
+      "step": 20
+    },
+    {
+      "epoch": 0.03527336860670194,
+      "grad_norm": 1.0475778579711914,
+      "learning_rate": 0.00029999895425857407,
+      "loss": 1.4226,
+      "step": 30
+    },
+    {
+      "epoch": 0.047031158142269255,
+      "grad_norm": 0.604124903678894,
+      "learning_rate": 0.0002999803637055168,
+      "loss": 0.9904,
+      "step": 40
+    },
+    {
+      "epoch": 0.058788947677836566,
+      "grad_norm": 0.30270475149154663,
+      "learning_rate": 0.0002999385377692348,
+      "loss": 0.8118,
+      "step": 50
+    },
+    {
+      "epoch": 0.07054673721340388,
+      "grad_norm": 0.2848896384239197,
+      "learning_rate": 0.0002998734829295208,
+      "loss": 0.7526,
+      "step": 60
+    },
+    {
+      "epoch": 0.0823045267489712,
+      "grad_norm": 0.14966866374015808,
+      "learning_rate": 0.00029978520926485496,
+      "loss": 0.7037,
+      "step": 70
+    },
+    {
+      "epoch": 0.09406231628453851,
+      "grad_norm": 0.12227073311805725,
+      "learning_rate": 0.0002996737304508438,
+      "loss": 0.6816,
+      "step": 80
+    },
+    {
+      "epoch": 0.10582010582010581,
+      "grad_norm": 0.16820231080055237,
+      "learning_rate": 0.00029953906375810115,
+      "loss": 0.6703,
+      "step": 90
+    },
+    {
+      "epoch": 0.11757789535567313,
+      "grad_norm": 0.12418051064014435,
+      "learning_rate": 0.000299381230049573,
+      "loss": 0.6573,
+      "step": 100
+    },
+    {
+      "epoch": 0.12933568489124045,
+      "grad_norm": 0.17970608174800873,
+      "learning_rate": 0.0002992002537773051,
+      "loss": 0.6416,
+      "step": 110
+    },
+    {
+      "epoch": 0.14109347442680775,
+      "grad_norm": 0.15002180635929108,
+      "learning_rate": 0.00029899616297865466,
+      "loss": 0.6373,
+      "step": 120
+    },
+    {
+      "epoch": 0.15285126396237508,
+      "grad_norm": 0.2321266084909439,
+      "learning_rate": 0.00029876898927194684,
+      "loss": 0.6318,
+      "step": 130
+    },
+    {
+      "epoch": 0.1646090534979424,
+      "grad_norm": 0.15638886392116547,
+      "learning_rate": 0.0002985187678515765,
+      "loss": 0.6278,
+      "step": 140
+    },
+    {
+      "epoch": 0.1763668430335097,
+      "grad_norm": 0.1360541135072708,
+      "learning_rate": 0.0002982455374825557,
+      "loss": 0.6316,
+      "step": 150
+    },
+    {
+      "epoch": 0.18812463256907702,
+      "grad_norm": 0.1484975814819336,
+      "learning_rate": 0.0002979493404945078,
+      "loss": 0.6248,
+      "step": 160
+    },
+    {
+      "epoch": 0.19988242210464433,
+      "grad_norm": 0.15455132722854614,
+      "learning_rate": 0.00029763022277511016,
+      "loss": 0.6149,
+      "step": 170
+    },
+    {
+      "epoch": 0.21164021164021163,
+      "grad_norm": 0.14312204718589783,
+      "learning_rate": 0.00029728823376298476,
+      "loss": 0.6122,
+      "step": 180
+    },
+    {
+      "epoch": 0.22339800117577896,
+      "grad_norm": 0.20580118894577026,
+      "learning_rate": 0.00029692342644003914,
+      "loss": 0.6177,
+      "step": 190
+    },
+    {
+      "epoch": 0.23515579071134626,
+      "grad_norm": 0.1568194329738617,
+      "learning_rate": 0.0002965358573232581,
+      "loss": 0.6023,
+      "step": 200
+    },
+    {
+      "epoch": 0.24691358024691357,
+      "grad_norm": 0.1828833371400833,
+      "learning_rate": 0.00029612558645594826,
+      "loss": 0.6107,
+      "step": 210
+    },
+    {
+      "epoch": 0.2586713697824809,
+      "grad_norm": 0.24727077782154083,
+      "learning_rate": 0.0002956926773984357,
+      "loss": 0.6052,
+      "step": 220
+    },
+    {
+      "epoch": 0.27042915931804823,
+      "grad_norm": 0.17706172168254852,
+      "learning_rate": 0.00029523719721821914,
+      "loss": 0.602,
+      "step": 230
+    },
+    {
+      "epoch": 0.2821869488536155,
+      "grad_norm": 0.22355277836322784,
+      "learning_rate": 0.00029475921647957967,
+      "loss": 0.6062,
+      "step": 240
+    },
+    {
+      "epoch": 0.29394473838918284,
+      "grad_norm": 0.1888413280248642,
+      "learning_rate": 0.00029489696943632825,
+      "loss": 0.6036,
+      "step": 250
+    },
+    {
+      "epoch": 0.30570252792475017,
+      "grad_norm": 0.15985628962516785,
+      "learning_rate": 0.0002944000922130167,
+      "loss": 0.6024,
+      "step": 260
+    },
+    {
+      "epoch": 0.31746031746031744,
+      "grad_norm": 0.15196776390075684,
+      "learning_rate": 0.00029388059386971724,
+      "loss": 0.5936,
+      "step": 270
+    },
+    {
+      "epoch": 0.3292181069958848,
+      "grad_norm": 0.18247170746326447,
+      "learning_rate": 0.0002933385557888875,
+      "loss": 0.5987,
+      "step": 280
+    },
+    {
+      "epoch": 0.3409758965314521,
+      "grad_norm": 0.162806436419487,
+      "learning_rate": 0.00029277406288396663,
+      "loss": 0.5955,
+      "step": 290
+    },
+    {
+      "epoch": 0.3527336860670194,
+      "grad_norm": 0.16122598946094513,
+      "learning_rate": 0.00029218720358607363,
+      "loss": 0.5973,
+      "step": 300
+    },
+    {
+      "epoch": 0.3644914756025867,
+      "grad_norm": 0.15314878523349762,
+      "learning_rate": 0.00029157806983015394,
+      "loss": 0.5897,
+      "step": 310
+    },
+    {
+      "epoch": 0.37624926513815404,
+      "grad_norm": 0.13723793625831604,
+      "learning_rate": 0.00029094675704057724,
+      "loss": 0.5936,
+      "step": 320
+    },
+    {
+      "epoch": 0.3880070546737213,
+      "grad_norm": 0.1514938473701477,
+      "learning_rate": 0.00029029336411618865,
+      "loss": 0.5876,
+      "step": 330
+    },
+    {
+      "epoch": 0.39976484420928865,
+      "grad_norm": 0.1737629473209381,
+      "learning_rate": 0.0002896179934148158,
+      "loss": 0.5884,
+      "step": 340
+    },
+    {
+      "epoch": 0.411522633744856,
+      "grad_norm": 0.1483820527791977,
+      "learning_rate": 0.0002889207507372337,
+      "loss": 0.5846,
+      "step": 350
+    },
+    {
+      "epoch": 0.42328042328042326,
+      "grad_norm": 0.14795410633087158,
+      "learning_rate": 0.0002882017453105906,
+      "loss": 0.5862,
+      "step": 360
+    },
+    {
+      "epoch": 0.4350382128159906,
+      "grad_norm": 0.16455809772014618,
+      "learning_rate": 0.0002865363126582549,
+      "loss": 0.5724,
+      "step": 370
+    },
+    {
+      "epoch": 0.4467960023515579,
+      "grad_norm": 0.21681314706802368,
+      "learning_rate": 0.00028575266221296395,
+      "loss": 0.589,
+      "step": 380
+    },
+    {
+      "epoch": 0.4585537918871252,
+      "grad_norm": 0.1644076406955719,
+      "learning_rate": 0.00028494798058030713,
+      "loss": 0.5887,
+      "step": 390
+    },
+    {
+      "epoch": 0.4703115814226925,
+      "grad_norm": 0.1598517894744873,
+      "learning_rate": 0.0002841223924238447,
+      "loss": 0.5764,
+      "step": 400
+    },
+    {
+      "epoch": 0.48206937095825986,
+      "grad_norm": 0.1925736665725708,
+      "learning_rate": 0.0002832760256460349,
+      "loss": 0.5808,
+      "step": 410
+    },
+    {
+      "epoch": 0.49382716049382713,
+      "grad_norm": 0.2404128462076187,
+      "learning_rate": 0.00028240901136841886,
+      "loss": 0.5733,
+      "step": 420
+    },
+    {
+      "epoch": 0.5055849500293945,
+      "grad_norm": 0.16819190979003906,
+      "learning_rate": 0.00028152148391130693,
+      "loss": 0.5747,
+      "step": 430
+    },
+    {
+      "epoch": 0.5173427395649618,
+      "grad_norm": 0.17305436730384827,
+      "learning_rate": 0.00028061358077296946,
+      "loss": 0.5918,
+      "step": 440
+    },
+    {
+      "epoch": 0.5291005291005291,
+      "grad_norm": 0.21177902817726135,
+      "learning_rate": 0.00027968544260833497,
+      "loss": 0.5692,
+      "step": 450
+    },
+    {
+      "epoch": 0.5408583186360965,
+      "grad_norm": 0.16747154295444489,
+      "learning_rate": 0.0002787372132071998,
+      "loss": 0.5717,
+      "step": 460
+    },
+    {
+      "epoch": 0.5526161081716637,
+      "grad_norm": 0.16251486539840698,
+      "learning_rate": 0.00027776903947195156,
+      "loss": 0.5756,
+      "step": 470
+    },
+    {
+      "epoch": 0.564373897707231,
+      "grad_norm": 0.1750066876411438,
+      "learning_rate": 0.00027678107139481056,
+      "loss": 0.5702,
+      "step": 480
+    },
+    {
+      "epoch": 0.5761316872427984,
+      "grad_norm": 0.20399513840675354,
+      "learning_rate": 0.00027693008651389484,
+      "loss": 0.575,
+      "step": 490
+    },
+    {
+      "epoch": 0.5878894767783657,
+      "grad_norm": 0.16991858184337616,
+      "learning_rate": 0.00027591974622649115,
+      "loss": 0.571,
+      "step": 500
+    },
+    {
+      "epoch": 0.599647266313933,
+      "grad_norm": 0.15292410552501678,
+      "learning_rate": 0.00027488967987351106,
+      "loss": 0.5791,
+      "step": 510
+    },
+    {
+      "epoch": 0.6114050558495003,
+      "grad_norm": 0.18092428147792816,
+      "learning_rate": 0.00027384004882088046,
+      "loss": 0.5632,
+      "step": 520
+    },
+    {
+      "epoch": 0.6231628453850676,
+      "grad_norm": 0.17729806900024414,
+      "learning_rate": 0.00027277101749944985,
+      "loss": 0.5686,
+      "step": 530
+    },
+    {
+      "epoch": 0.6349206349206349,
+      "grad_norm": 0.18892714381217957,
+      "learning_rate": 0.00027168275337923555,
+      "loss": 0.564,
+      "step": 540
+    },
+    {
+      "epoch": 0.6466784244562023,
+      "grad_norm": 0.16683107614517212,
+      "learning_rate": 0.0002693242905393658,
+      "loss": 0.5744,
+      "step": 550
+    },
+    {
+      "epoch": 0.6584362139917695,
+      "grad_norm": 0.1752244234085083,
+      "learning_rate": 0.0002681837303409013,
+      "loss": 0.5685,
+      "step": 560
+    },
+    {
+      "epoch": 0.6701940035273368,
+      "grad_norm": 0.1891213059425354,
+      "learning_rate": 0.0002670248607838145,
+      "loss": 0.5634,
+      "step": 570
+    },
+    {
+      "epoch": 0.6819517930629042,
+      "grad_norm": 0.16213206946849823,
+      "learning_rate": 0.0002658478614034631,
+      "loss": 0.5655,
+      "step": 580
+    },
+    {
+      "epoch": 0.6937095825984715,
+      "grad_norm": 0.18708591163158417,
+      "learning_rate": 0.0002646529145439286,
+      "loss": 0.5616,
+      "step": 590
+    },
+    {
+      "epoch": 0.7054673721340388,
+      "grad_norm": 0.16980920732021332,
+      "learning_rate": 0.000263440205329767,
+      "loss": 0.561,
+      "step": 600
+    },
+    {
+      "epoch": 0.7172251616696061,
+      "grad_norm": 0.1808006316423416,
+      "learning_rate": 0.0002622099216373283,
+      "loss": 0.5695,
+      "step": 610
+    },
+    {
+      "epoch": 0.7289829512051734,
+      "grad_norm": 0.2212889939546585,
+      "learning_rate": 0.00026096225406565073,
+      "loss": 0.5675,
+      "step": 620
+    },
+    {
+      "epoch": 0.7407407407407407,
+      "grad_norm": 0.1887008398771286,
+      "learning_rate": 0.00025969739590693243,
+      "loss": 0.5613,
+      "step": 630
+    },
+    {
+      "epoch": 0.7524985302763081,
+      "grad_norm": 0.16703972220420837,
+      "learning_rate": 0.0002584155431165858,
+      "loss": 0.5714,
+      "step": 640
+    },
+    {
+      "epoch": 0.7642563198118754,
+      "grad_norm": 0.17618794739246368,
+      "learning_rate": 0.00025711689428288,
+      "loss": 0.5606,
+      "step": 650
+    },
+    {
+      "epoch": 0.7760141093474426,
+      "grad_norm": 0.19063800573349,
+      "learning_rate": 0.0002558016505961747,
+      "loss": 0.5654,
+      "step": 660
+    },
+    {
+      "epoch": 0.78777189888301,
+      "grad_norm": 0.18943698704242706,
+      "learning_rate": 0.0002544700158177514,
+      "loss": 0.5578,
+      "step": 670
+    },
+    {
+      "epoch": 0.7995296884185773,
+      "grad_norm": 0.16459155082702637,
+      "learning_rate": 0.00025312219624824573,
+      "loss": 0.5638,
+      "step": 680
+    },
+    {
+      "epoch": 0.8112874779541446,
+      "grad_norm": 0.15839265286922455,
+      "learning_rate": 0.0002517584006956874,
+      "loss": 0.5643,
+      "step": 690
+    },
+    {
+      "epoch": 0.823045267489712,
+      "grad_norm": 0.1620016098022461,
+      "learning_rate": 0.00025037884044315045,
+      "loss": 0.5503,
+      "step": 700
+    },
+    {
+      "epoch": 0.8348030570252792,
+      "grad_norm": 0.17974963784217834,
+      "learning_rate": 0.0002489837292160211,
+      "loss": 0.5678,
+      "step": 710
+    },
+    {
+      "epoch": 0.8465608465608465,
+      "grad_norm": 0.17759168148040771,
+      "learning_rate": 0.0002475732831488866,
+      "loss": 0.565,
+      "step": 720
+    },
+    {
+      "epoch": 0.8583186360964139,
+      "grad_norm": 0.1844816654920578,
+      "learning_rate": 0.0002461477207520511,
+      "loss": 0.5478,
+      "step": 730
+    },
+    {
+      "epoch": 0.8700764256319812,
+      "grad_norm": 0.18819017708301544,
+      "learning_rate": 0.0002447072628776832,
+      "loss": 0.5564,
+      "step": 740
+    },
+    {
+      "epoch": 0.8818342151675485,
+      "grad_norm": 0.1674388349056244,
+      "learning_rate": 0.00024325213268560155,
+      "loss": 0.5535,
+      "step": 750
+    },
+    {
+      "epoch": 0.8935920047031158,
+      "grad_norm": 0.1836009919643402,
+      "learning_rate": 0.00024178255560870153,
+      "loss": 0.563,
+      "step": 760
+    },
+    {
+      "epoch": 0.9053497942386831,
+      "grad_norm": 0.15371523797512054,
+      "learning_rate": 0.000240298759318031,
+      "loss": 0.5567,
+      "step": 770
+    },
+    {
+      "epoch": 0.9171075837742504,
+      "grad_norm": 0.23757289350032806,
+      "learning_rate": 0.00023880097368751866,
+      "loss": 0.5602,
+      "step": 780
+    },
+    {
+      "epoch": 0.9288653733098178,
+      "grad_norm": 0.18779964745044708,
+      "learning_rate": 0.00023728943075836153,
+      "loss": 0.5638,
+      "step": 790
+    },
+    {
+      "epoch": 0.940623162845385,
+      "grad_norm": 0.2101014107465744,
+      "learning_rate": 0.00023576436470307627,
+      "loss": 0.5523,
+      "step": 800
+    },
+    {
+      "epoch": 0.9523809523809523,
+      "grad_norm": 0.19982023537158966,
+      "learning_rate": 0.00023422601178922054,
+      "loss": 0.559,
+      "step": 810
+    },
+    {
+      "epoch": 0.9641387419165197,
+      "grad_norm": 0.16859062016010284,
+      "learning_rate": 0.00023267461034278986,
+      "loss": 0.5606,
+      "step": 820
+    },
+    {
+      "epoch": 0.975896531452087,
+      "grad_norm": 0.1712592989206314,
+      "learning_rate": 0.00023111040071129553,
+      "loss": 0.5485,
+      "step": 830
+    },
+    {
+      "epoch": 0.9876543209876543,
+      "grad_norm": 0.16725780069828033,
+      "learning_rate": 0.00022953362522652892,
+      "loss": 0.5587,
+      "step": 840
+    },
+    {
+      "epoch": 0.9994121105232217,
+      "grad_norm": 0.15668565034866333,
+      "learning_rate": 0.00022794452816701931,
+      "loss": 0.5559,
+      "step": 850
+    },
+    {
+      "epoch": 1.0117577895355674,
+      "grad_norm": 0.18699130415916443,
+      "learning_rate": 0.00022634335572018906,
+      "loss": 0.5993,
+      "step": 860
+    },
+    {
+      "epoch": 1.0235155790711346,
+      "grad_norm": 0.18478669226169586,
+      "learning_rate": 0.0002247303559442139,
+      "loss": 0.547,
+      "step": 870
+    },
+    {
+      "epoch": 1.035273368606702,
+      "grad_norm": 0.16820459067821503,
+      "learning_rate": 0.00022310577872959293,
+      "loss": 0.5516,
+      "step": 880
+    },
+    {
+      "epoch": 1.0470311581422693,
+      "grad_norm": 0.1703968346118927,
+      "learning_rate": 0.0002214698757604348,
+      "loss": 0.5453,
+      "step": 890
+    },
+    {
+      "epoch": 1.0587889476778365,
+      "grad_norm": 0.16966000199317932,
+      "learning_rate": 0.00021982290047546622,
+      "loss": 0.5446,
+      "step": 900
+    },
+    {
+      "epoch": 1.0705467372134039,
+      "grad_norm": 0.18867208063602448,
+      "learning_rate": 0.00021816510802876842,
+      "loss": 0.5541,
+      "step": 910
+    },
+    {
+      "epoch": 1.0823045267489713,
+      "grad_norm": 0.19499064981937408,
+      "learning_rate": 0.00021649675525024802,
+      "loss": 0.5586,
+      "step": 920
+    },
+    {
+      "epoch": 1.0940623162845384,
+      "grad_norm": 0.18678592145442963,
+      "learning_rate": 0.0002148181006058483,
+      "loss": 0.5623,
+      "step": 930
+    },
+    {
+      "epoch": 1.1058201058201058,
+      "grad_norm": 0.16676273941993713,
+      "learning_rate": 0.0002131294041575066,
+      "loss": 0.5477,
+      "step": 940
+    },
+    {
+      "epoch": 1.1175778953556732,
+      "grad_norm": 0.15307499468326569,
+      "learning_rate": 0.0002114309275228651,
+      "loss": 0.5428,
+      "step": 950
+    },
+    {
+      "epoch": 1.1293356848912404,
+      "grad_norm": 0.1780371516942978,
+      "learning_rate": 0.00020972293383474022,
+      "loss": 0.5421,
+      "step": 960
+    },
+    {
+      "epoch": 1.1410934744268078,
+      "grad_norm": 0.15791286528110504,
+      "learning_rate": 0.0002080056877003573,
+      "loss": 0.5393,
+      "step": 970
+    },
+    {
+      "epoch": 1.1528512639623751,
+      "grad_norm": 0.20655624568462372,
+      "learning_rate": 0.00020627945516035677,
+      "loss": 0.5488,
+      "step": 980
+    },
+    {
+      "epoch": 1.1646090534979423,
+      "grad_norm": 0.1702195405960083,
+      "learning_rate": 0.00020454450364757864,
+      "loss": 0.5465,
+      "step": 990
+    },
+    {
+      "epoch": 1.1763668430335097,
+      "grad_norm": 0.18632200360298157,
+      "learning_rate": 0.00020280110194563077,
+      "loss": 0.5457,
+      "step": 1000
+    },
+    {
+      "epoch": 1.188124632569077,
+      "grad_norm": 0.15633152425289154,
+      "learning_rate": 0.00020104952014724843,
+      "loss": 0.5383,
+      "step": 1010
+    },
+    {
+      "epoch": 1.1998824221046442,
+      "grad_norm": 0.1846858710050583,
+      "learning_rate": 0.0001992900296124505,
+      "loss": 0.5355,
+      "step": 1020
+    },
+    {
+      "epoch": 1.2116402116402116,
+      "grad_norm": 0.1534046232700348,
+      "learning_rate": 0.0001975229029264998,
+      "loss": 0.5461,
+      "step": 1030
+    },
+    {
+      "epoch": 1.223398001175779,
+      "grad_norm": 0.1676914095878601,
+      "learning_rate": 0.00019574841385767335,
+      "loss": 0.5405,
+      "step": 1040
+    },
+    {
+      "epoch": 1.2351557907113462,
+      "grad_norm": 0.16140472888946533,
+      "learning_rate": 0.00019396683731484938,
+      "loss": 0.5397,
+      "step": 1050
+    },
+    {
+      "epoch": 1.2469135802469136,
+      "grad_norm": 0.16447623074054718,
+      "learning_rate": 0.0001921784493049176,
+      "loss": 0.5465,
+      "step": 1060
+    },
+    {
+      "epoch": 1.258671369782481,
+      "grad_norm": 0.17460733652114868,
+      "learning_rate": 0.0001903835268900197,
+      "loss": 0.5446,
+      "step": 1070
+    },
+    {
+      "epoch": 1.2704291593180481,
+      "grad_norm": 0.1717451810836792,
+      "learning_rate": 0.00018858234814462578,
+      "loss": 0.5445,
+      "step": 1080
+    },
+    {
+      "epoch": 1.2821869488536155,
+      "grad_norm": 0.17322146892547607,
+      "learning_rate": 0.00018677519211245447,
+      "loss": 0.5482,
+      "step": 1090
+    },
+    {
+      "epoch": 1.293944738389183,
+      "grad_norm": 0.1692781150341034,
+      "learning_rate": 0.00018496233876324252,
+      "loss": 0.5406,
+      "step": 1100
+    },
+    {
+      "epoch": 1.3057025279247503,
+      "grad_norm": 0.15191440284252167,
+      "learning_rate": 0.00018314406894937133,
+      "loss": 0.5457,
+      "step": 1110
+    },
+    {
+      "epoch": 1.3174603174603174,
+      "grad_norm": 0.1597563475370407,
+      "learning_rate": 0.00018132066436235626,
+      "loss": 0.5525,
+      "step": 1120
+    },
+    {
+      "epoch": 1.3292181069958848,
+      "grad_norm": 0.17783337831497192,
+      "learning_rate": 0.0001794924074892063,
+      "loss": 0.5393,
+      "step": 1130
+    },
+    {
+      "epoch": 1.340975896531452,
+      "grad_norm": 0.17720560729503632,
+      "learning_rate": 0.00017765958156866046,
+      "loss": 0.5428,
+      "step": 1140
+    },
+    {
+      "epoch": 1.3527336860670194,
+      "grad_norm": 0.18664534389972687,
+      "learning_rate": 0.00017582247054730735,
+      "loss": 0.5401,
+      "step": 1150
+    },
+    {
+      "epoch": 1.3644914756025868,
+      "grad_norm": 0.19094522297382355,
+      "learning_rate": 0.00017398135903559566,
+      "loss": 0.55,
+      "step": 1160
+    },
+    {
+      "epoch": 1.3762492651381542,
+      "grad_norm": 0.1851995885372162,
+      "learning_rate": 0.0001721365322637415,
+      "loss": 0.5497,
+      "step": 1170
+    },
+    {
+      "epoch": 1.3880070546737213,
+      "grad_norm": 0.1769183874130249,
+      "learning_rate": 0.00017028827603753934,
+      "loss": 0.5353,
+      "step": 1180
+    },
+    {
+      "epoch": 1.3997648442092887,
+      "grad_norm": 0.16335001587867737,
+      "learning_rate": 0.00016843687669408468,
+      "loss": 0.5396,
+      "step": 1190
+    },
+    {
+      "epoch": 1.4115226337448559,
+      "grad_norm": 0.23527562618255615,
+      "learning_rate": 0.00016658262105741356,
+      "loss": 0.5407,
+      "step": 1200
+    },
+    {
+      "epoch": 1.4232804232804233,
+      "grad_norm": 0.1645500510931015,
+      "learning_rate": 0.00016472579639406715,
+      "loss": 0.5387,
+      "step": 1210
+    },
+    {
+      "epoch": 1.4350382128159906,
+      "grad_norm": 0.17597968876361847,
+      "learning_rate": 0.00016286669036858734,
+      "loss": 0.5393,
+      "step": 1220
+    },
+    {
+      "epoch": 1.446796002351558,
+      "grad_norm": 0.17105944454669952,
+      "learning_rate": 0.00016100559099895126,
+      "loss": 0.5486,
+      "step": 1230
+    },
+    {
+      "epoch": 1.4585537918871252,
+      "grad_norm": 0.18194060027599335,
+      "learning_rate": 0.0001591427866119505,
+      "loss": 0.5405,
+      "step": 1240
+    },
+    {
+      "epoch": 1.4703115814226926,
+      "grad_norm": 0.155792698264122,
+      "learning_rate": 0.00015727856579852287,
+      "loss": 0.5361,
+      "step": 1250
+    },
+    {
+      "epoch": 1.4820693709582597,
+      "grad_norm": 0.16111020743846893,
+      "learning_rate": 0.00015541321736904285,
+      "loss": 0.5257,
+      "step": 1260
+    },
+    {
+      "epoch": 1.4938271604938271,
+      "grad_norm": 0.1990344375371933,
+      "learning_rate": 0.00015354703030857845,
+      "loss": 0.5394,
+      "step": 1270
+    },
+    {
+      "epoch": 1.5055849500293945,
+      "grad_norm": 0.17325428128242493,
+      "learning_rate": 0.00015168029373212083,
+      "loss": 0.5336,
+      "step": 1280
+    },
+    {
+      "epoch": 1.517342739564962,
+      "grad_norm": 0.16529801487922668,
+      "learning_rate": 0.00014981329683979363,
+      "loss": 0.5467,
+      "step": 1290
+    },
+    {
+      "epoch": 1.529100529100529,
+      "grad_norm": 0.16588719189167023,
+      "learning_rate": 0.00014794632887204948,
+      "loss": 0.5427,
+      "step": 1300
+    },
+    {
+      "epoch": 1.5408583186360965,
+      "grad_norm": 0.2161647081375122,
+      "learning_rate": 0.00014607967906485973,
+      "loss": 0.5448,
+      "step": 1310
+    },
+    {
+      "epoch": 1.5526161081716636,
+      "grad_norm": 0.16725030541419983,
+      "learning_rate": 0.00014421363660490561,
+      "loss": 0.544,
+      "step": 1320
+    },
+    {
+      "epoch": 1.564373897707231,
+      "grad_norm": 0.1518913358449936,
+      "learning_rate": 0.00014234849058477627,
+      "loss": 0.5391,
+      "step": 1330
+    },
+    {
+      "epoch": 1.5761316872427984,
+      "grad_norm": 0.14821507036685944,
+      "learning_rate": 0.00014048452995818193,
+      "loss": 0.5342,
+      "step": 1340
+    },
+    {
+      "epoch": 1.5878894767783658,
+      "grad_norm": 0.15578289330005646,
+      "learning_rate": 0.0001386220434951882,
+      "loss": 0.545,
+      "step": 1350
+    },
+    {
+      "epoch": 1.599647266313933,
+      "grad_norm": 0.15221339464187622,
+      "learning_rate": 0.00013676131973747914,
+      "loss": 0.538,
+      "step": 1360
+    },
+    {
+      "epoch": 1.6114050558495003,
+      "grad_norm": 0.15612226724624634,
+      "learning_rate": 0.00013490264695365555,
+      "loss": 0.5403,
+      "step": 1370
+    },
+    {
+      "epoch": 1.6231628453850675,
+      "grad_norm": 0.17553019523620605,
+      "learning_rate": 0.00013304631309457547,
+      "loss": 0.5405,
+      "step": 1380
+    },
+    {
+      "epoch": 1.6349206349206349,
+      "grad_norm": 0.19782039523124695,
+      "learning_rate": 0.00013119260574874408,
+      "loss": 0.5376,
+      "step": 1390
+    },
+    {
+      "epoch": 1.6466784244562023,
+      "grad_norm": 0.17477013170719147,
+      "learning_rate": 0.00012934181209775975,
+      "loss": 0.5364,
+      "step": 1400
+    },
+    {
+      "epoch": 1.6584362139917697,
+      "grad_norm": 0.16500313580036163,
+      "learning_rate": 0.0001274942188718229,
+      "loss": 0.5376,
+      "step": 1410
+    },
+    {
+      "epoch": 1.6701940035273368,
+      "grad_norm": 0.1557675004005432,
+      "learning_rate": 0.0001256501123053151,
+      "loss": 0.5388,
+      "step": 1420
+    },
+    {
+      "epoch": 1.6819517930629042,
+      "grad_norm": 0.1546131670475006,
+      "learning_rate": 0.0001238097780924547,
+      "loss": 0.5356,
+      "step": 1430
+    },
+    {
+      "epoch": 1.6937095825984714,
+      "grad_norm": 0.17645175755023956,
+      "learning_rate": 0.00012197350134303635,
+      "loss": 0.5312,
+      "step": 1440
+    },
+    {
+      "epoch": 1.7054673721340388,
+      "grad_norm": 0.19012205302715302,
+      "learning_rate": 0.00012014156653826095,
+      "loss": 0.5349,
+      "step": 1450
+    },
+    {
+      "epoch": 1.7172251616696061,
+      "grad_norm": 0.2023085504770279,
+      "learning_rate": 0.0001183142574866631,
+      "loss": 0.5359,
+      "step": 1460
+    },
+    {
+      "epoch": 1.7289829512051735,
+      "grad_norm": 0.16710489988327026,
+      "learning_rate": 0.00011649185728014243,
+      "loss": 0.5286,
+      "step": 1470
+    },
+    {
+      "epoch": 1.7407407407407407,
+      "grad_norm": 0.14735428988933563,
+      "learning_rate": 0.00011467464825010651,
+      "loss": 0.5363,
+      "step": 1480
+    },
+    {
+      "epoch": 1.752498530276308,
+      "grad_norm": 0.18269070982933044,
+      "learning_rate": 0.00011286291192373113,
+      "loss": 0.5286,
+      "step": 1490
+    },
+    {
+      "epoch": 1.7642563198118753,
+      "grad_norm": 0.14498022198677063,
+      "learning_rate": 0.00011105692898034526,
+      "loss": 0.5405,
+      "step": 1500
+    },
+    {
+      "epoch": 1.7760141093474426,
+      "grad_norm": 0.14717623591423035,
+      "learning_rate": 0.00011030134683871457,
+      "loss": 0.5366,
+      "step": 1510
+    },
+    {
+      "epoch": 1.78777189888301,
+      "grad_norm": 0.17796878516674042,
+      "learning_rate": 0.00010849400298316251,
+      "loss": 0.532,
+      "step": 1520
+    },
+    {
+      "epoch": 1.7995296884185774,
+      "grad_norm": 0.1469317078590393,
+      "learning_rate": 0.00010669316128508382,
+      "loss": 0.5236,
+      "step": 1530
+    },
+    {
+      "epoch": 1.8112874779541446,
+      "grad_norm": 0.18005433678627014,
+      "learning_rate": 0.00010489910385687536,
+      "loss": 0.5348,
+      "step": 1540
+    },
+    {
+      "epoch": 1.823045267489712,
+      "grad_norm": 0.15956807136535645,
+      "learning_rate": 0.00010311211174813848,
+      "loss": 0.5336,
+      "step": 1550
+    },
+    {
+      "epoch": 1.8348030570252791,
+      "grad_norm": 0.15382471680641174,
+      "learning_rate": 0.00010133246490165088,
+      "loss": 0.5324,
+      "step": 1560
+    },
+    {
+      "epoch": 1.8465608465608465,
+      "grad_norm": 0.1824691742658615,
+      "learning_rate": 9.956044210951182e-05,
+      "loss": 0.5361,
+      "step": 1570
+    },
+    {
+      "epoch": 1.858318636096414,
+      "grad_norm": 0.1410980224609375,
+      "learning_rate": 9.779632096946785e-05,
+      "loss": 0.5381,
+      "step": 1580
+    },
+    {
+      "epoch": 1.8700764256319813,
+      "grad_norm": 0.1595907211303711,
+      "learning_rate": 9.604037784142558e-05,
+      "loss": 0.5365,
+      "step": 1590
+    },
+    {
+      "epoch": 1.8818342151675485,
+      "grad_norm": 0.14672662317752838,
+      "learning_rate": 9.429288780415795e-05,
+      "loss": 0.527,
+      "step": 1600
+    },
+    {
+      "epoch": 1.8935920047031158,
+      "grad_norm": 0.14634069800376892,
+      "learning_rate": 9.255412461221186e-05,
+      "loss": 0.5398,
+      "step": 1610
+    },
+    {
+      "epoch": 1.905349794238683,
+      "grad_norm": 0.1447485089302063,
+      "learning_rate": 9.082436065302276e-05,
+      "loss": 0.5294,
+      "step": 1620
+    },
+    {
+      "epoch": 1.9171075837742504,
+      "grad_norm": 0.17091409862041473,
+      "learning_rate": 8.91038669042435e-05,
+      "loss": 0.5377,
+      "step": 1630
+    },
+    {
+      "epoch": 1.9288653733098178,
+      "grad_norm": 0.17171324789524078,
+      "learning_rate": 8.739291289129437e-05,
+      "loss": 0.5391,
+      "step": 1640
+    },
+    {
+      "epoch": 1.9406231628453852,
+      "grad_norm": 0.13635724782943726,
+      "learning_rate": 8.569176664514001e-05,
+      "loss": 0.5378,
+      "step": 1650
+    },
+    {
+      "epoch": 1.9523809523809523,
+      "grad_norm": 0.1508239209651947,
+      "learning_rate": 8.400069466030108e-05,
+      "loss": 0.5267,
+      "step": 1660
+    },
+    {
+      "epoch": 1.9641387419165197,
+      "grad_norm": 0.16271938383579254,
+      "learning_rate": 8.23199618531062e-05,
+      "loss": 0.5283,
+      "step": 1670
+    },
+    {
+      "epoch": 1.9758965314520869,
+      "grad_norm": 0.15180297195911407,
+      "learning_rate": 8.064983152019116e-05,
+      "loss": 0.5384,
+      "step": 1680
+    },
+    {
+      "epoch": 1.9876543209876543,
+      "grad_norm": 0.14841122925281525,
+      "learning_rate": 7.899056529725217e-05,
+      "loss": 0.5374,
+      "step": 1690
+    },
+    {
+      "epoch": 1.9994121105232217,
+      "grad_norm": 0.19703982770442963,
+      "learning_rate": 7.734242311805907e-05,
+      "loss": 0.5243,
+      "step": 1700
+    },
+    {
+      "epoch": 2.0117577895355674,
+      "grad_norm": 0.14616432785987854,
+      "learning_rate": 7.570566317373487e-05,
+      "loss": 0.5806,
+      "step": 1710
+    },
+    {
+      "epoch": 2.0235155790711348,
+      "grad_norm": 0.16693364083766937,
+      "learning_rate": 7.408054187230906e-05,
+      "loss": 0.5304,
+      "step": 1720
+    },
+    {
+      "epoch": 2.0352733686067017,
+      "grad_norm": 0.1606026440858841,
+      "learning_rate": 7.246731379854956e-05,
+      "loss": 0.5367,
+      "step": 1730
+    },
+    {
+      "epoch": 2.047031158142269,
+      "grad_norm": 0.1404394507408142,
+      "learning_rate": 7.086623167408036e-05,
+      "loss": 0.5243,
+      "step": 1740
+    },
+    {
+      "epoch": 2.0587889476778365,
+      "grad_norm": 0.14590832591056824,
+      "learning_rate": 6.857574431125467e-05,
+      "loss": 0.5278,
+      "step": 1750
+    },
+    {
+      "epoch": 2.070546737213404,
+      "grad_norm": 0.1431526392698288,
+      "learning_rate": 6.701407699318802e-05,
+      "loss": 0.5266,
+      "step": 1760
+    },
+    {
+      "epoch": 2.0823045267489713,
+      "grad_norm": 0.17230112850666046,
+      "learning_rate": 6.546526608962577e-05,
+      "loss": 0.5257,
+      "step": 1770
+    },
+    {
+      "epoch": 2.0940623162845386,
+      "grad_norm": 0.14844703674316406,
+      "learning_rate": 6.392955154674479e-05,
+      "loss": 0.5171,
+      "step": 1780
+    },
+    {
+      "epoch": 2.105820105820106,
+      "grad_norm": 0.15154774487018585,
+      "learning_rate": 6.240717128179681e-05,
+      "loss": 0.5159,
+      "step": 1790
+    },
+    {
+      "epoch": 2.117577895355673,
+      "grad_norm": 0.13736847043037415,
+      "learning_rate": 6.089836114624921e-05,
+      "loss": 0.516,
+      "step": 1800
+    },
+    {
+      "epoch": 2.1293356848912404,
+      "grad_norm": 0.15059371292591095,
+      "learning_rate": 5.940335488924649e-05,
+      "loss": 0.5253,
+      "step": 1810
+    },
+    {
+      "epoch": 2.1410934744268078,
+      "grad_norm": 0.13336555659770966,
+      "learning_rate": 5.792238412139728e-05,
+      "loss": 0.5221,
+      "step": 1820
+    },
+    {
+      "epoch": 2.152851263962375,
+      "grad_norm": 0.14053775370121002,
+      "learning_rate": 5.645567827889223e-05,
+      "loss": 0.5205,
+      "step": 1830
+    },
+    {
+      "epoch": 2.1646090534979425,
+      "grad_norm": 0.16514064371585846,
+      "learning_rate": 5.5003464587959475e-05,
+      "loss": 0.5315,
+      "step": 1840
+    },
+    {
+      "epoch": 2.1763668430335095,
+      "grad_norm": 0.14599189162254333,
+      "learning_rate": 5.3565968029661834e-05,
+      "loss": 0.5274,
+      "step": 1850
+    },
+    {
+      "epoch": 2.188124632569077,
+      "grad_norm": 0.1498604267835617,
+      "learning_rate": 5.2143411305042195e-05,
+      "loss": 0.5249,
+      "step": 1860
+    },
+    {
+      "epoch": 2.1998824221046442,
+      "grad_norm": 0.1399809867143631,
+      "learning_rate": 5.073601480062219e-05,
+      "loss": 0.5284,
+      "step": 1870
+    },
+    {
+      "epoch": 2.2116402116402116,
+      "grad_norm": 0.1515013873577118,
+      "learning_rate": 4.9343996554258994e-05,
+      "loss": 0.5242,
+      "step": 1880
+    },
+    {
+      "epoch": 2.223398001175779,
+      "grad_norm": 0.14144369959831238,
+      "learning_rate": 4.796757222136643e-05,
+      "loss": 0.5269,
+      "step": 1890
+    },
+    {
+      "epoch": 2.2351557907113464,
+      "grad_norm": 0.14651526510715485,
+      "learning_rate": 4.660695504150523e-05,
+      "loss": 0.525,
+      "step": 1900
+    },
+    {
+      "epoch": 2.246913580246914,
+      "grad_norm": 0.15439248085021973,
+      "learning_rate": 4.526235580534689e-05,
+      "loss": 0.5265,
+      "step": 1910
+    },
+    {
+      "epoch": 2.2586713697824807,
+      "grad_norm": 0.1623186618089676,
+      "learning_rate": 4.3933982822017876e-05,
+      "loss": 0.526,
+      "step": 1920
+    },
+    {
+      "epoch": 2.270429159318048,
+      "grad_norm": 0.13653329014778137,
+      "learning_rate": 4.2622041886827434e-05,
+      "loss": 0.5202,
+      "step": 1930
+    },
+    {
+      "epoch": 2.2821869488536155,
+      "grad_norm": 0.14020675420761108,
+      "learning_rate": 4.132673624938525e-05,
+      "loss": 0.5221,
+      "step": 1940
+    },
+    {
+      "epoch": 2.293944738389183,
+      "grad_norm": 0.14169755578041077,
+      "learning_rate": 4.004826658211375e-05,
+      "loss": 0.5299,
+      "step": 1950
+    },
+    {
+      "epoch": 2.3057025279247503,
+      "grad_norm": 0.16250820457935333,
+      "learning_rate": 3.878683094915886e-05,
+      "loss": 0.5246,
+      "step": 1960
+    },
+    {
+      "epoch": 2.317460317460317,
+      "grad_norm": 0.13330097496509552,
+      "learning_rate": 3.7542624775705795e-05,
+      "loss": 0.5155,
+      "step": 1970
+    },
+    {
+      "epoch": 2.3292181069958846,
+      "grad_norm": 0.13023224472999573,
+      "learning_rate": 3.631584081770296e-05,
+      "loss": 0.5221,
+      "step": 1980
+    },
+    {
+      "epoch": 2.340975896531452,
+      "grad_norm": 0.15445029735565186,
+      "learning_rate": 3.510666913199968e-05,
+      "loss": 0.5164,
+      "step": 1990
+    },
+    {
+      "epoch": 2.3527336860670194,
+      "grad_norm": 0.12522496283054352,
+      "learning_rate": 3.391529704690232e-05,
+      "loss": 0.5242,
+      "step": 2000
+    },
+    {
+      "epoch": 2.3644914756025868,
+      "grad_norm": 0.12382670491933823,
+      "learning_rate": 3.274190913315244e-05,
+      "loss": 0.5157,
+      "step": 2010
+    },
+    {
+      "epoch": 2.376249265138154,
+      "grad_norm": 0.1230044960975647,
+      "learning_rate": 3.158668717533282e-05,
+      "loss": 0.5173,
+      "step": 2020
+    },
+    {
+      "epoch": 2.3880070546737215,
+      "grad_norm": 0.13459515571594238,
+      "learning_rate": 3.0449810143705006e-05,
+      "loss": 0.5244,
+      "step": 2030
+    },
+    {
+      "epoch": 2.3997648442092885,
+      "grad_norm": 0.17454054951667786,
+      "learning_rate": 2.9331454166482292e-05,
+      "loss": 0.5275,
+      "step": 2040
+    },
+    {
+      "epoch": 2.411522633744856,
+      "grad_norm": 0.13058680295944214,
+      "learning_rate": 2.823179250254393e-05,
+      "loss": 0.5207,
+      "step": 2050
+    },
+    {
+      "epoch": 2.4232804232804233,
+      "grad_norm": 0.1417444348335266,
+      "learning_rate": 2.7150995514593066e-05,
+      "loss": 0.5251,
+      "step": 2060
+    },
+    {
+      "epoch": 2.4350382128159906,
+      "grad_norm": 0.12965553998947144,
+      "learning_rate": 2.6089230642763654e-05,
+      "loss": 0.5188,
+      "step": 2070
+    },
+    {
+      "epoch": 2.446796002351558,
+      "grad_norm": 0.12700672447681427,
+      "learning_rate": 2.504666237868045e-05,
+      "loss": 0.5262,
+      "step": 2080
+    },
+    {
+      "epoch": 2.458553791887125,
+      "grad_norm": 0.12814930081367493,
+      "learning_rate": 2.4023452239975154e-05,
+      "loss": 0.5218,
+      "step": 2090
+    },
+    {
+      "epoch": 2.4703115814226924,
+      "grad_norm": 0.12329702824354172,
+      "learning_rate": 2.3019758745263807e-05,
+      "loss": 0.5218,
+      "step": 2100
+    },
+    {
+      "epoch": 2.4820693709582597,
+      "grad_norm": 0.12658412754535675,
+      "learning_rate": 2.2035737389588692e-05,
+      "loss": 0.5238,
+      "step": 2110
+    },
+    {
+      "epoch": 2.493827160493827,
+      "grad_norm": 0.1283237338066101,
+      "learning_rate": 2.1071540620328454e-05,
+      "loss": 0.5248,
+      "step": 2120
+    },
+    {
+      "epoch": 2.5055849500293945,
+      "grad_norm": 0.12314069271087646,
+      "learning_rate": 2.0127317813580695e-05,
+      "loss": 0.5298,
+      "step": 2130
+    },
+    {
+      "epoch": 2.517342739564962,
+      "grad_norm": 0.13253411650657654,
+      "learning_rate": 1.9203215251019993e-05,
+      "loss": 0.5189,
+      "step": 2140
+    },
+    {
+      "epoch": 2.5291005291005293,
+      "grad_norm": 0.1233448013663292,
+      "learning_rate": 1.829937609723568e-05,
+      "loss": 0.5237,
+      "step": 2150
+    },
+    {
+      "epoch": 2.5408583186360962,
+      "grad_norm": 0.12831535935401917,
+      "learning_rate": 1.7415940377552407e-05,
+      "loss": 0.5247,
+      "step": 2160
+    },
+    {
+      "epoch": 2.5526161081716636,
+      "grad_norm": 0.12279964238405228,
+      "learning_rate": 1.6553044956336846e-05,
+      "loss": 0.5198,
+      "step": 2170
+    },
+    {
+      "epoch": 2.564373897707231,
+      "grad_norm": 0.12143947184085846,
+      "learning_rate": 1.5710823515794525e-05,
+      "loss": 0.5318,
+      "step": 2180
+    },
+    {
+      "epoch": 2.5761316872427984,
+      "grad_norm": 0.12285321950912476,
+      "learning_rate": 1.488940653525914e-05,
+      "loss": 0.5257,
+      "step": 2190
+    },
+    {
+      "epoch": 2.587889476778366,
+      "grad_norm": 0.12113109976053238,
+      "learning_rate": 1.4088921270978487e-05,
+      "loss": 0.5177,
+      "step": 2200
+    },
+    {
+      "epoch": 2.5996472663139327,
+      "grad_norm": 0.12191707640886307,
+      "learning_rate": 1.3309491736399591e-05,
+      "loss": 0.5154,
+      "step": 2210
+    },
+    {
+      "epoch": 2.6114050558495006,
+      "grad_norm": 0.1208585798740387,
+      "learning_rate": 1.2551238682956055e-05,
+      "loss": 0.5218,
+      "step": 2220
+    },
+    {
+      "epoch": 2.6231628453850675,
+      "grad_norm": 0.1282462328672409,
+      "learning_rate": 1.1944657452427275e-05,
+      "loss": 0.5241,
+      "step": 2230
+    },
+    {
+      "epoch": 2.634920634920635,
+      "grad_norm": 0.127785786986351,
+      "learning_rate": 1.122131288355197e-05,
+      "loss": 0.518,
+      "step": 2240
+    },
+    {
+      "epoch": 2.6466784244562023,
+      "grad_norm": 0.12176629900932312,
+      "learning_rate": 1.0519708808639633e-05,
+      "loss": 0.5127,
+      "step": 2250
+    },
+    {
+      "epoch": 2.6584362139917697,
+      "grad_norm": 0.12018881738185883,
+      "learning_rate": 9.839955138076939e-06,
+      "loss": 0.5101,
+      "step": 2260
+    },
+    {
+      "epoch": 2.670194003527337,
+      "grad_norm": 0.11915557831525803,
+      "learning_rate": 9.182158359256641e-06,
+      "loss": 0.5244,
+      "step": 2270
+    },
+    {
+      "epoch": 2.681951793062904,
+      "grad_norm": 0.1288643330335617,
+      "learning_rate": 8.546421519895952e-06,
+      "loss": 0.521,
+      "step": 2280
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 2550,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 60,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.305368829130965e+18,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-2280/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5d2017df37014d07bfaf5754a6b1f4edf954b091471fe61d129c8c337a276e67
+size 5752

checkpoint-2280/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff