Training in progress, step 230, checkpoint

Browse files

Files changed (11) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +31 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +223 -0
last-checkpoint/trainer_state.json +1659 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: EleutherAI/pythia-160m
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "EleutherAI/pythia-160m",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query_key_value",
+    "dense",
+    "dense_h_to_4h",
+    "dense_4h_to_h"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3ad79fb14cc6ec28feeefe3839f7e24e2ca25c105baf0e4416a2e08e629d6dad
+size 4731640

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:32e7b338f62df389646648716d41ee0394ee519917bb025c5e0e30326f43e77c
+size 2505722

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f2e5a4027c66158aab99bcade837fd8699252a9c436c9bb3c0e97693d74f5dd
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e1d7102aa089f621f52127a4a3b2b3b4d8038e0ab9354f63876b34958b5a207
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,223 @@

+{
+  "add_bos_token": false,
+  "add_eos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<|padding|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50254": {
+      "content": "                        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50255": {
+      "content": "                       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50256": {
+      "content": "                      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50257": {
+      "content": "                     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50258": {
+      "content": "                    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50259": {
+      "content": "                   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50260": {
+      "content": "                  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50261": {
+      "content": "                 ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50262": {
+      "content": "                ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50263": {
+      "content": "               ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50264": {
+      "content": "              ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50265": {
+      "content": "             ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50266": {
+      "content": "            ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50267": {
+      "content": "           ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50268": {
+      "content": "          ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50269": {
+      "content": "         ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50270": {
+      "content": "        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50271": {
+      "content": "       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50272": {
+      "content": "      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50273": {
+      "content": "     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50274": {
+      "content": "    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50275": {
+      "content": "   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50276": {
+      "content": "  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50277": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "GPTNeoXTokenizer",
+  "unk_token": "<|endoftext|>"
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1659 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.25054466230936817,
+  "eval_steps": 230,
+  "global_step": 230,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0010893246187363835,
+      "grad_norm": 497.369384765625,
+      "learning_rate": 2e-05,
+      "loss": 12.1306,
+      "step": 1
+    },
+    {
+      "epoch": 0.0010893246187363835,
+      "eval_loss": 3.1589934825897217,
+      "eval_runtime": 4.1717,
+      "eval_samples_per_second": 92.767,
+      "eval_steps_per_second": 46.503,
+      "step": 1
+    },
+    {
+      "epoch": 0.002178649237472767,
+      "grad_norm": 384.06939697265625,
+      "learning_rate": 4e-05,
+      "loss": 11.9955,
+      "step": 2
+    },
+    {
+      "epoch": 0.0032679738562091504,
+      "grad_norm": 418.12237548828125,
+      "learning_rate": 6e-05,
+      "loss": 11.2344,
+      "step": 3
+    },
+    {
+      "epoch": 0.004357298474945534,
+      "grad_norm": 319.05267333984375,
+      "learning_rate": 8e-05,
+      "loss": 13.0403,
+      "step": 4
+    },
+    {
+      "epoch": 0.0054466230936819175,
+      "grad_norm": 275.94830322265625,
+      "learning_rate": 0.0001,
+      "loss": 12.7821,
+      "step": 5
+    },
+    {
+      "epoch": 0.006535947712418301,
+      "grad_norm": 446.70166015625,
+      "learning_rate": 0.00012,
+      "loss": 12.3682,
+      "step": 6
+    },
+    {
+      "epoch": 0.007625272331154684,
+      "grad_norm": 403.9425964355469,
+      "learning_rate": 0.00014,
+      "loss": 12.164,
+      "step": 7
+    },
+    {
+      "epoch": 0.008714596949891068,
+      "grad_norm": 370.84893798828125,
+      "learning_rate": 0.00016,
+      "loss": 12.4891,
+      "step": 8
+    },
+    {
+      "epoch": 0.00980392156862745,
+      "grad_norm": 357.8050537109375,
+      "learning_rate": 0.00018,
+      "loss": 12.3877,
+      "step": 9
+    },
+    {
+      "epoch": 0.010893246187363835,
+      "grad_norm": 409.2673645019531,
+      "learning_rate": 0.0002,
+      "loss": 12.547,
+      "step": 10
+    },
+    {
+      "epoch": 0.011982570806100218,
+      "grad_norm": 395.81939697265625,
+      "learning_rate": 0.00019999940145388063,
+      "loss": 13.2017,
+      "step": 11
+    },
+    {
+      "epoch": 0.013071895424836602,
+      "grad_norm": 440.0821228027344,
+      "learning_rate": 0.00019999760582268763,
+      "loss": 12.3833,
+      "step": 12
+    },
+    {
+      "epoch": 0.014161220043572984,
+      "grad_norm": 394.557373046875,
+      "learning_rate": 0.00019999461312791638,
+      "loss": 11.9402,
+      "step": 13
+    },
+    {
+      "epoch": 0.015250544662309368,
+      "grad_norm": 332.89373779296875,
+      "learning_rate": 0.0001999904234053922,
+      "loss": 11.9442,
+      "step": 14
+    },
+    {
+      "epoch": 0.016339869281045753,
+      "grad_norm": 312.7531433105469,
+      "learning_rate": 0.00019998503670526994,
+      "loss": 11.8069,
+      "step": 15
+    },
+    {
+      "epoch": 0.017429193899782137,
+      "grad_norm": 472.9444580078125,
+      "learning_rate": 0.00019997845309203334,
+      "loss": 12.1533,
+      "step": 16
+    },
+    {
+      "epoch": 0.018518518518518517,
+      "grad_norm": 397.2251892089844,
+      "learning_rate": 0.00019997067264449433,
+      "loss": 11.0714,
+      "step": 17
+    },
+    {
+      "epoch": 0.0196078431372549,
+      "grad_norm": 369.2944641113281,
+      "learning_rate": 0.00019996169545579207,
+      "loss": 12.2346,
+      "step": 18
+    },
+    {
+      "epoch": 0.020697167755991286,
+      "grad_norm": 574.8084106445312,
+      "learning_rate": 0.00019995152163339178,
+      "loss": 11.162,
+      "step": 19
+    },
+    {
+      "epoch": 0.02178649237472767,
+      "grad_norm": 611.6285400390625,
+      "learning_rate": 0.00019994015129908346,
+      "loss": 12.0132,
+      "step": 20
+    },
+    {
+      "epoch": 0.02287581699346405,
+      "grad_norm": 425.55072021484375,
+      "learning_rate": 0.00019992758458898055,
+      "loss": 12.0108,
+      "step": 21
+    },
+    {
+      "epoch": 0.023965141612200435,
+      "grad_norm": 498.50439453125,
+      "learning_rate": 0.00019991382165351814,
+      "loss": 11.8223,
+      "step": 22
+    },
+    {
+      "epoch": 0.02505446623093682,
+      "grad_norm": 454.2897644042969,
+      "learning_rate": 0.00019989886265745128,
+      "loss": 12.0506,
+      "step": 23
+    },
+    {
+      "epoch": 0.026143790849673203,
+      "grad_norm": 557.8632202148438,
+      "learning_rate": 0.00019988270777985292,
+      "loss": 11.8529,
+      "step": 24
+    },
+    {
+      "epoch": 0.027233115468409588,
+      "grad_norm": 412.888916015625,
+      "learning_rate": 0.00019986535721411186,
+      "loss": 11.4545,
+      "step": 25
+    },
+    {
+      "epoch": 0.02832244008714597,
+      "grad_norm": 375.8723449707031,
+      "learning_rate": 0.00019984681116793038,
+      "loss": 10.8877,
+      "step": 26
+    },
+    {
+      "epoch": 0.029411764705882353,
+      "grad_norm": 445.2213134765625,
+      "learning_rate": 0.00019982706986332175,
+      "loss": 11.8088,
+      "step": 27
+    },
+    {
+      "epoch": 0.030501089324618737,
+      "grad_norm": 390.9215087890625,
+      "learning_rate": 0.00019980613353660763,
+      "loss": 12.8907,
+      "step": 28
+    },
+    {
+      "epoch": 0.03159041394335512,
+      "grad_norm": 876.1612548828125,
+      "learning_rate": 0.00019978400243841508,
+      "loss": 12.3549,
+      "step": 29
+    },
+    {
+      "epoch": 0.032679738562091505,
+      "grad_norm": 573.1425170898438,
+      "learning_rate": 0.00019976067683367385,
+      "loss": 12.5321,
+      "step": 30
+    },
+    {
+      "epoch": 0.03376906318082789,
+      "grad_norm": 470.3711853027344,
+      "learning_rate": 0.0001997361570016129,
+      "loss": 12.0172,
+      "step": 31
+    },
+    {
+      "epoch": 0.034858387799564274,
+      "grad_norm": 679.3441772460938,
+      "learning_rate": 0.00019971044323575728,
+      "loss": 10.8628,
+      "step": 32
+    },
+    {
+      "epoch": 0.03594771241830065,
+      "grad_norm": 626.5648193359375,
+      "learning_rate": 0.0001996835358439244,
+      "loss": 11.3385,
+      "step": 33
+    },
+    {
+      "epoch": 0.037037037037037035,
+      "grad_norm": 733.734375,
+      "learning_rate": 0.00019965543514822062,
+      "loss": 10.4336,
+      "step": 34
+    },
+    {
+      "epoch": 0.03812636165577342,
+      "grad_norm": 587.365966796875,
+      "learning_rate": 0.00019962614148503718,
+      "loss": 12.0594,
+      "step": 35
+    },
+    {
+      "epoch": 0.0392156862745098,
+      "grad_norm": 405.00531005859375,
+      "learning_rate": 0.00019959565520504623,
+      "loss": 11.8103,
+      "step": 36
+    },
+    {
+      "epoch": 0.04030501089324619,
+      "grad_norm": 398.32415771484375,
+      "learning_rate": 0.00019956397667319668,
+      "loss": 11.0375,
+      "step": 37
+    },
+    {
+      "epoch": 0.04139433551198257,
+      "grad_norm": 422.72808837890625,
+      "learning_rate": 0.00019953110626870979,
+      "loss": 12.0295,
+      "step": 38
+    },
+    {
+      "epoch": 0.042483660130718956,
+      "grad_norm": 482.6229553222656,
+      "learning_rate": 0.00019949704438507459,
+      "loss": 10.7632,
+      "step": 39
+    },
+    {
+      "epoch": 0.04357298474945534,
+      "grad_norm": 526.7586059570312,
+      "learning_rate": 0.00019946179143004325,
+      "loss": 11.2995,
+      "step": 40
+    },
+    {
+      "epoch": 0.044662309368191724,
+      "grad_norm": 503.21221923828125,
+      "learning_rate": 0.0001994253478256262,
+      "loss": 12.1067,
+      "step": 41
+    },
+    {
+      "epoch": 0.0457516339869281,
+      "grad_norm": 655.2205810546875,
+      "learning_rate": 0.0001993877140080869,
+      "loss": 11.5295,
+      "step": 42
+    },
+    {
+      "epoch": 0.046840958605664486,
+      "grad_norm": 440.0908203125,
+      "learning_rate": 0.000199348890427937,
+      "loss": 11.6821,
+      "step": 43
+    },
+    {
+      "epoch": 0.04793028322440087,
+      "grad_norm": 456.47406005859375,
+      "learning_rate": 0.00019930887754993044,
+      "loss": 11.7873,
+      "step": 44
+    },
+    {
+      "epoch": 0.049019607843137254,
+      "grad_norm": 841.0929565429688,
+      "learning_rate": 0.00019926767585305835,
+      "loss": 11.6674,
+      "step": 45
+    },
+    {
+      "epoch": 0.05010893246187364,
+      "grad_norm": 581.7565307617188,
+      "learning_rate": 0.000199225285830543,
+      "loss": 11.7123,
+      "step": 46
+    },
+    {
+      "epoch": 0.05119825708061002,
+      "grad_norm": 547.7841796875,
+      "learning_rate": 0.00019918170798983211,
+      "loss": 10.9854,
+      "step": 47
+    },
+    {
+      "epoch": 0.05228758169934641,
+      "grad_norm": 685.59814453125,
+      "learning_rate": 0.00019913694285259256,
+      "loss": 11.8064,
+      "step": 48
+    },
+    {
+      "epoch": 0.05337690631808279,
+      "grad_norm": 636.0125122070312,
+      "learning_rate": 0.00019909099095470444,
+      "loss": 11.1934,
+      "step": 49
+    },
+    {
+      "epoch": 0.054466230936819175,
+      "grad_norm": 528.2201538085938,
+      "learning_rate": 0.00019904385284625424,
+      "loss": 12.0213,
+      "step": 50
+    },
+    {
+      "epoch": 0.05555555555555555,
+      "grad_norm": 490.21875,
+      "learning_rate": 0.00019899552909152866,
+      "loss": 10.6632,
+      "step": 51
+    },
+    {
+      "epoch": 0.05664488017429194,
+      "grad_norm": 500.16021728515625,
+      "learning_rate": 0.00019894602026900758,
+      "loss": 11.65,
+      "step": 52
+    },
+    {
+      "epoch": 0.05773420479302832,
+      "grad_norm": 477.1259765625,
+      "learning_rate": 0.00019889532697135734,
+      "loss": 11.6934,
+      "step": 53
+    },
+    {
+      "epoch": 0.058823529411764705,
+      "grad_norm": 1109.7750244140625,
+      "learning_rate": 0.00019884344980542338,
+      "loss": 10.7624,
+      "step": 54
+    },
+    {
+      "epoch": 0.05991285403050109,
+      "grad_norm": 555.9796142578125,
+      "learning_rate": 0.00019879038939222329,
+      "loss": 12.7329,
+      "step": 55
+    },
+    {
+      "epoch": 0.06100217864923747,
+      "grad_norm": 557.53271484375,
+      "learning_rate": 0.0001987361463669392,
+      "loss": 11.0998,
+      "step": 56
+    },
+    {
+      "epoch": 0.06209150326797386,
+      "grad_norm": 483.9259948730469,
+      "learning_rate": 0.00019868072137891002,
+      "loss": 11.8222,
+      "step": 57
+    },
+    {
+      "epoch": 0.06318082788671024,
+      "grad_norm": 740.6929321289062,
+      "learning_rate": 0.00019862411509162406,
+      "loss": 10.6816,
+      "step": 58
+    },
+    {
+      "epoch": 0.06427015250544663,
+      "grad_norm": 873.5588989257812,
+      "learning_rate": 0.0001985663281827108,
+      "loss": 10.5937,
+      "step": 59
+    },
+    {
+      "epoch": 0.06535947712418301,
+      "grad_norm": 617.0968627929688,
+      "learning_rate": 0.00019850736134393286,
+      "loss": 11.6752,
+      "step": 60
+    },
+    {
+      "epoch": 0.0664488017429194,
+      "grad_norm": 652.836669921875,
+      "learning_rate": 0.00019844721528117766,
+      "loss": 10.4189,
+      "step": 61
+    },
+    {
+      "epoch": 0.06753812636165578,
+      "grad_norm": 681.41064453125,
+      "learning_rate": 0.00019838589071444903,
+      "loss": 11.4517,
+      "step": 62
+    },
+    {
+      "epoch": 0.06862745098039216,
+      "grad_norm": 610.8583984375,
+      "learning_rate": 0.00019832338837785863,
+      "loss": 11.4514,
+      "step": 63
+    },
+    {
+      "epoch": 0.06971677559912855,
+      "grad_norm": 703.3029174804688,
+      "learning_rate": 0.00019825970901961705,
+      "loss": 11.0479,
+      "step": 64
+    },
+    {
+      "epoch": 0.07080610021786492,
+      "grad_norm": 558.1321411132812,
+      "learning_rate": 0.000198194853402025,
+      "loss": 10.8034,
+      "step": 65
+    },
+    {
+      "epoch": 0.0718954248366013,
+      "grad_norm": 729.005126953125,
+      "learning_rate": 0.00019812882230146398,
+      "loss": 11.2189,
+      "step": 66
+    },
+    {
+      "epoch": 0.07298474945533769,
+      "grad_norm": 563.0946044921875,
+      "learning_rate": 0.00019806161650838723,
+      "loss": 11.8089,
+      "step": 67
+    },
+    {
+      "epoch": 0.07407407407407407,
+      "grad_norm": 774.91796875,
+      "learning_rate": 0.00019799323682731,
+      "loss": 10.8409,
+      "step": 68
+    },
+    {
+      "epoch": 0.07516339869281045,
+      "grad_norm": 636.3812866210938,
+      "learning_rate": 0.00019792368407680025,
+      "loss": 11.4898,
+      "step": 69
+    },
+    {
+      "epoch": 0.07625272331154684,
+      "grad_norm": 741.6439208984375,
+      "learning_rate": 0.00019785295908946848,
+      "loss": 11.1101,
+      "step": 70
+    },
+    {
+      "epoch": 0.07734204793028322,
+      "grad_norm": 902.0681762695312,
+      "learning_rate": 0.00019778106271195806,
+      "loss": 10.9148,
+      "step": 71
+    },
+    {
+      "epoch": 0.0784313725490196,
+      "grad_norm": 608.00634765625,
+      "learning_rate": 0.00019770799580493494,
+      "loss": 12.5593,
+      "step": 72
+    },
+    {
+      "epoch": 0.07952069716775599,
+      "grad_norm": 645.3396606445312,
+      "learning_rate": 0.00019763375924307735,
+      "loss": 11.0296,
+      "step": 73
+    },
+    {
+      "epoch": 0.08061002178649238,
+      "grad_norm": 532.682861328125,
+      "learning_rate": 0.0001975583539150655,
+      "loss": 10.9512,
+      "step": 74
+    },
+    {
+      "epoch": 0.08169934640522876,
+      "grad_norm": 606.407470703125,
+      "learning_rate": 0.00019748178072357065,
+      "loss": 10.0461,
+      "step": 75
+    },
+    {
+      "epoch": 0.08278867102396514,
+      "grad_norm": 538.4356079101562,
+      "learning_rate": 0.00019740404058524457,
+      "loss": 10.923,
+      "step": 76
+    },
+    {
+      "epoch": 0.08387799564270153,
+      "grad_norm": 593.4065551757812,
+      "learning_rate": 0.00019732513443070836,
+      "loss": 11.2116,
+      "step": 77
+    },
+    {
+      "epoch": 0.08496732026143791,
+      "grad_norm": 667.9852905273438,
+      "learning_rate": 0.00019724506320454153,
+      "loss": 10.7769,
+      "step": 78
+    },
+    {
+      "epoch": 0.0860566448801743,
+      "grad_norm": 667.6882934570312,
+      "learning_rate": 0.0001971638278652705,
+      "loss": 11.6724,
+      "step": 79
+    },
+    {
+      "epoch": 0.08714596949891068,
+      "grad_norm": 664.25244140625,
+      "learning_rate": 0.0001970814293853572,
+      "loss": 11.0462,
+      "step": 80
+    },
+    {
+      "epoch": 0.08823529411764706,
+      "grad_norm": 713.4217529296875,
+      "learning_rate": 0.00019699786875118747,
+      "loss": 11.3038,
+      "step": 81
+    },
+    {
+      "epoch": 0.08932461873638345,
+      "grad_norm": 636.1452026367188,
+      "learning_rate": 0.00019691314696305913,
+      "loss": 11.0193,
+      "step": 82
+    },
+    {
+      "epoch": 0.09041394335511982,
+      "grad_norm": 536.7321166992188,
+      "learning_rate": 0.00019682726503517017,
+      "loss": 11.3826,
+      "step": 83
+    },
+    {
+      "epoch": 0.0915032679738562,
+      "grad_norm": 599.102783203125,
+      "learning_rate": 0.00019674022399560648,
+      "loss": 10.964,
+      "step": 84
+    },
+    {
+      "epoch": 0.09259259259259259,
+      "grad_norm": 1010.7560424804688,
+      "learning_rate": 0.00019665202488632956,
+      "loss": 11.0961,
+      "step": 85
+    },
+    {
+      "epoch": 0.09368191721132897,
+      "grad_norm": 651.3975219726562,
+      "learning_rate": 0.0001965626687631641,
+      "loss": 10.7751,
+      "step": 86
+    },
+    {
+      "epoch": 0.09477124183006536,
+      "grad_norm": 691.282958984375,
+      "learning_rate": 0.00019647215669578536,
+      "loss": 11.463,
+      "step": 87
+    },
+    {
+      "epoch": 0.09586056644880174,
+      "grad_norm": 780.7763061523438,
+      "learning_rate": 0.00019638048976770628,
+      "loss": 11.4905,
+      "step": 88
+    },
+    {
+      "epoch": 0.09694989106753812,
+      "grad_norm": 731.2135620117188,
+      "learning_rate": 0.00019628766907626446,
+      "loss": 11.0423,
+      "step": 89
+    },
+    {
+      "epoch": 0.09803921568627451,
+      "grad_norm": 606.6620483398438,
+      "learning_rate": 0.00019619369573260924,
+      "loss": 10.0837,
+      "step": 90
+    },
+    {
+      "epoch": 0.09912854030501089,
+      "grad_norm": 954.3007202148438,
+      "learning_rate": 0.00019609857086168823,
+      "loss": 10.4564,
+      "step": 91
+    },
+    {
+      "epoch": 0.10021786492374728,
+      "grad_norm": 489.27496337890625,
+      "learning_rate": 0.00019600229560223388,
+      "loss": 11.548,
+      "step": 92
+    },
+    {
+      "epoch": 0.10130718954248366,
+      "grad_norm": 563.7119140625,
+      "learning_rate": 0.00019590487110674983,
+      "loss": 10.8069,
+      "step": 93
+    },
+    {
+      "epoch": 0.10239651416122005,
+      "grad_norm": 460.32440185546875,
+      "learning_rate": 0.0001958062985414972,
+      "loss": 10.7325,
+      "step": 94
+    },
+    {
+      "epoch": 0.10348583877995643,
+      "grad_norm": 642.6327514648438,
+      "learning_rate": 0.00019570657908648048,
+      "loss": 12.1817,
+      "step": 95
+    },
+    {
+      "epoch": 0.10457516339869281,
+      "grad_norm": 686.3006591796875,
+      "learning_rate": 0.0001956057139354335,
+      "loss": 10.7484,
+      "step": 96
+    },
+    {
+      "epoch": 0.1056644880174292,
+      "grad_norm": 1625.68359375,
+      "learning_rate": 0.0001955037042958052,
+      "loss": 9.9405,
+      "step": 97
+    },
+    {
+      "epoch": 0.10675381263616558,
+      "grad_norm": 661.0103759765625,
+      "learning_rate": 0.00019540055138874505,
+      "loss": 10.6417,
+      "step": 98
+    },
+    {
+      "epoch": 0.10784313725490197,
+      "grad_norm": 754.760009765625,
+      "learning_rate": 0.00019529625644908847,
+      "loss": 10.5583,
+      "step": 99
+    },
+    {
+      "epoch": 0.10893246187363835,
+      "grad_norm": 638.9000854492188,
+      "learning_rate": 0.0001951908207253421,
+      "loss": 11.0231,
+      "step": 100
+    },
+    {
+      "epoch": 0.11002178649237472,
+      "grad_norm": 553.3854370117188,
+      "learning_rate": 0.00019508424547966884,
+      "loss": 10.9961,
+      "step": 101
+    },
+    {
+      "epoch": 0.1111111111111111,
+      "grad_norm": 587.3681030273438,
+      "learning_rate": 0.00019497653198787264,
+      "loss": 10.701,
+      "step": 102
+    },
+    {
+      "epoch": 0.11220043572984749,
+      "grad_norm": 751.8494262695312,
+      "learning_rate": 0.00019486768153938338,
+      "loss": 11.4205,
+      "step": 103
+    },
+    {
+      "epoch": 0.11328976034858387,
+      "grad_norm": 620.5661010742188,
+      "learning_rate": 0.0001947576954372413,
+      "loss": 11.6781,
+      "step": 104
+    },
+    {
+      "epoch": 0.11437908496732026,
+      "grad_norm": 592.9255981445312,
+      "learning_rate": 0.00019464657499808152,
+      "loss": 10.2305,
+      "step": 105
+    },
+    {
+      "epoch": 0.11546840958605664,
+      "grad_norm": 504.8605651855469,
+      "learning_rate": 0.0001945343215521182,
+      "loss": 12.4109,
+      "step": 106
+    },
+    {
+      "epoch": 0.11655773420479303,
+      "grad_norm": 567.1307983398438,
+      "learning_rate": 0.0001944209364431286,
+      "loss": 10.948,
+      "step": 107
+    },
+    {
+      "epoch": 0.11764705882352941,
+      "grad_norm": 802.73046875,
+      "learning_rate": 0.00019430642102843707,
+      "loss": 11.1612,
+      "step": 108
+    },
+    {
+      "epoch": 0.1187363834422658,
+      "grad_norm": 692.0497436523438,
+      "learning_rate": 0.00019419077667889872,
+      "loss": 10.6958,
+      "step": 109
+    },
+    {
+      "epoch": 0.11982570806100218,
+      "grad_norm": 678.1286010742188,
+      "learning_rate": 0.00019407400477888315,
+      "loss": 11.3934,
+      "step": 110
+    },
+    {
+      "epoch": 0.12091503267973856,
+      "grad_norm": 545.8530883789062,
+      "learning_rate": 0.00019395610672625767,
+      "loss": 11.7126,
+      "step": 111
+    },
+    {
+      "epoch": 0.12200435729847495,
+      "grad_norm": 551.471435546875,
+      "learning_rate": 0.00019383708393237075,
+      "loss": 11.3177,
+      "step": 112
+    },
+    {
+      "epoch": 0.12309368191721133,
+      "grad_norm": 747.8993530273438,
+      "learning_rate": 0.00019371693782203498,
+      "loss": 10.9254,
+      "step": 113
+    },
+    {
+      "epoch": 0.12418300653594772,
+      "grad_norm": 814.4208374023438,
+      "learning_rate": 0.00019359566983351013,
+      "loss": 9.821,
+      "step": 114
+    },
+    {
+      "epoch": 0.12527233115468409,
+      "grad_norm": 862.31494140625,
+      "learning_rate": 0.0001934732814184859,
+      "loss": 11.2622,
+      "step": 115
+    },
+    {
+      "epoch": 0.12636165577342048,
+      "grad_norm": 636.8887939453125,
+      "learning_rate": 0.00019334977404206443,
+      "loss": 10.6286,
+      "step": 116
+    },
+    {
+      "epoch": 0.12745098039215685,
+      "grad_norm": 670.830322265625,
+      "learning_rate": 0.00019322514918274308,
+      "loss": 11.1403,
+      "step": 117
+    },
+    {
+      "epoch": 0.12854030501089325,
+      "grad_norm": 545.7217407226562,
+      "learning_rate": 0.00019309940833239626,
+      "loss": 11.752,
+      "step": 118
+    },
+    {
+      "epoch": 0.12962962962962962,
+      "grad_norm": 587.8623046875,
+      "learning_rate": 0.00019297255299625797,
+      "loss": 11.7527,
+      "step": 119
+    },
+    {
+      "epoch": 0.13071895424836602,
+      "grad_norm": 1006.7312622070312,
+      "learning_rate": 0.00019284458469290354,
+      "loss": 10.5548,
+      "step": 120
+    },
+    {
+      "epoch": 0.1318082788671024,
+      "grad_norm": 590.1889038085938,
+      "learning_rate": 0.00019271550495423168,
+      "loss": 10.2359,
+      "step": 121
+    },
+    {
+      "epoch": 0.1328976034858388,
+      "grad_norm": 605.3558959960938,
+      "learning_rate": 0.00019258531532544585,
+      "loss": 10.7942,
+      "step": 122
+    },
+    {
+      "epoch": 0.13398692810457516,
+      "grad_norm": 605.3388061523438,
+      "learning_rate": 0.00019245401736503608,
+      "loss": 10.9458,
+      "step": 123
+    },
+    {
+      "epoch": 0.13507625272331156,
+      "grad_norm": 672.321533203125,
+      "learning_rate": 0.00019232161264475997,
+      "loss": 12.0993,
+      "step": 124
+    },
+    {
+      "epoch": 0.13616557734204793,
+      "grad_norm": 554.8330688476562,
+      "learning_rate": 0.00019218810274962417,
+      "loss": 11.2272,
+      "step": 125
+    },
+    {
+      "epoch": 0.13725490196078433,
+      "grad_norm": 865.9181518554688,
+      "learning_rate": 0.00019205348927786532,
+      "loss": 10.1409,
+      "step": 126
+    },
+    {
+      "epoch": 0.1383442265795207,
+      "grad_norm": 788.566162109375,
+      "learning_rate": 0.00019191777384093081,
+      "loss": 12.0527,
+      "step": 127
+    },
+    {
+      "epoch": 0.1394335511982571,
+      "grad_norm": 699.4249267578125,
+      "learning_rate": 0.0001917809580634596,
+      "loss": 10.8955,
+      "step": 128
+    },
+    {
+      "epoch": 0.14052287581699346,
+      "grad_norm": 726.8323974609375,
+      "learning_rate": 0.00019164304358326275,
+      "loss": 10.264,
+      "step": 129
+    },
+    {
+      "epoch": 0.14161220043572983,
+      "grad_norm": 878.0653686523438,
+      "learning_rate": 0.00019150403205130383,
+      "loss": 11.0059,
+      "step": 130
+    },
+    {
+      "epoch": 0.14270152505446623,
+      "grad_norm": 523.306884765625,
+      "learning_rate": 0.00019136392513167903,
+      "loss": 10.8748,
+      "step": 131
+    },
+    {
+      "epoch": 0.1437908496732026,
+      "grad_norm": 962.2786254882812,
+      "learning_rate": 0.00019122272450159745,
+      "loss": 11.4164,
+      "step": 132
+    },
+    {
+      "epoch": 0.144880174291939,
+      "grad_norm": 911.98193359375,
+      "learning_rate": 0.0001910804318513609,
+      "loss": 11.5345,
+      "step": 133
+    },
+    {
+      "epoch": 0.14596949891067537,
+      "grad_norm": 1103.3953857421875,
+      "learning_rate": 0.0001909370488843436,
+      "loss": 10.3984,
+      "step": 134
+    },
+    {
+      "epoch": 0.14705882352941177,
+      "grad_norm": 617.2359008789062,
+      "learning_rate": 0.00019079257731697196,
+      "loss": 11.3573,
+      "step": 135
+    },
+    {
+      "epoch": 0.14814814814814814,
+      "grad_norm": 704.6926879882812,
+      "learning_rate": 0.0001906470188787039,
+      "loss": 11.3981,
+      "step": 136
+    },
+    {
+      "epoch": 0.14923747276688454,
+      "grad_norm": 702.8963012695312,
+      "learning_rate": 0.00019050037531200814,
+      "loss": 11.738,
+      "step": 137
+    },
+    {
+      "epoch": 0.1503267973856209,
+      "grad_norm": 848.1795043945312,
+      "learning_rate": 0.00019035264837234347,
+      "loss": 11.4592,
+      "step": 138
+    },
+    {
+      "epoch": 0.1514161220043573,
+      "grad_norm": 863.1195068359375,
+      "learning_rate": 0.00019020383982813765,
+      "loss": 11.1281,
+      "step": 139
+    },
+    {
+      "epoch": 0.15250544662309368,
+      "grad_norm": 755.1558837890625,
+      "learning_rate": 0.00019005395146076616,
+      "loss": 11.8929,
+      "step": 140
+    },
+    {
+      "epoch": 0.15359477124183007,
+      "grad_norm": 1292.68359375,
+      "learning_rate": 0.00018990298506453104,
+      "loss": 11.1247,
+      "step": 141
+    },
+    {
+      "epoch": 0.15468409586056645,
+      "grad_norm": 623.200927734375,
+      "learning_rate": 0.0001897509424466393,
+      "loss": 10.7395,
+      "step": 142
+    },
+    {
+      "epoch": 0.15577342047930284,
+      "grad_norm": 628.1660766601562,
+      "learning_rate": 0.00018959782542718128,
+      "loss": 11.6995,
+      "step": 143
+    },
+    {
+      "epoch": 0.1568627450980392,
+      "grad_norm": 818.7709350585938,
+      "learning_rate": 0.000189443635839109,
+      "loss": 11.4782,
+      "step": 144
+    },
+    {
+      "epoch": 0.1579520697167756,
+      "grad_norm": 777.9385986328125,
+      "learning_rate": 0.00018928837552821404,
+      "loss": 9.1531,
+      "step": 145
+    },
+    {
+      "epoch": 0.15904139433551198,
+      "grad_norm": 632.1226196289062,
+      "learning_rate": 0.0001891320463531055,
+      "loss": 10.9672,
+      "step": 146
+    },
+    {
+      "epoch": 0.16013071895424835,
+      "grad_norm": 1045.4794921875,
+      "learning_rate": 0.00018897465018518782,
+      "loss": 12.5507,
+      "step": 147
+    },
+    {
+      "epoch": 0.16122004357298475,
+      "grad_norm": 1832.7476806640625,
+      "learning_rate": 0.0001888161889086383,
+      "loss": 10.2606,
+      "step": 148
+    },
+    {
+      "epoch": 0.16230936819172112,
+      "grad_norm": 1147.483154296875,
+      "learning_rate": 0.00018865666442038456,
+      "loss": 11.287,
+      "step": 149
+    },
+    {
+      "epoch": 0.16339869281045752,
+      "grad_norm": 1081.8319091796875,
+      "learning_rate": 0.00018849607863008193,
+      "loss": 10.9169,
+      "step": 150
+    },
+    {
+      "epoch": 0.1644880174291939,
+      "grad_norm": 896.3159790039062,
+      "learning_rate": 0.0001883344334600904,
+      "loss": 11.1309,
+      "step": 151
+    },
+    {
+      "epoch": 0.1655773420479303,
+      "grad_norm": 1027.65869140625,
+      "learning_rate": 0.00018817173084545176,
+      "loss": 10.6067,
+      "step": 152
+    },
+    {
+      "epoch": 0.16666666666666666,
+      "grad_norm": 792.0274047851562,
+      "learning_rate": 0.0001880079727338664,
+      "loss": 11.5252,
+      "step": 153
+    },
+    {
+      "epoch": 0.16775599128540306,
+      "grad_norm": 1679.2010498046875,
+      "learning_rate": 0.00018784316108566996,
+      "loss": 12.0362,
+      "step": 154
+    },
+    {
+      "epoch": 0.16884531590413943,
+      "grad_norm": 897.4620971679688,
+      "learning_rate": 0.00018767729787380985,
+      "loss": 11.3795,
+      "step": 155
+    },
+    {
+      "epoch": 0.16993464052287582,
+      "grad_norm": 1291.0208740234375,
+      "learning_rate": 0.00018751038508382176,
+      "loss": 11.0511,
+      "step": 156
+    },
+    {
+      "epoch": 0.1710239651416122,
+      "grad_norm": 1339.29345703125,
+      "learning_rate": 0.00018734242471380572,
+      "loss": 10.7022,
+      "step": 157
+    },
+    {
+      "epoch": 0.1721132897603486,
+      "grad_norm": 900.9859619140625,
+      "learning_rate": 0.00018717341877440226,
+      "loss": 11.0906,
+      "step": 158
+    },
+    {
+      "epoch": 0.17320261437908496,
+      "grad_norm": 811.28662109375,
+      "learning_rate": 0.0001870033692887684,
+      "loss": 10.1273,
+      "step": 159
+    },
+    {
+      "epoch": 0.17429193899782136,
+      "grad_norm": 1265.2435302734375,
+      "learning_rate": 0.00018683227829255334,
+      "loss": 12.1877,
+      "step": 160
+    },
+    {
+      "epoch": 0.17538126361655773,
+      "grad_norm": 1323.2113037109375,
+      "learning_rate": 0.00018666014783387408,
+      "loss": 11.6536,
+      "step": 161
+    },
+    {
+      "epoch": 0.17647058823529413,
+      "grad_norm": 1310.9921875,
+      "learning_rate": 0.000186486979973291,
+      "loss": 12.1153,
+      "step": 162
+    },
+    {
+      "epoch": 0.1775599128540305,
+      "grad_norm": 806.4589233398438,
+      "learning_rate": 0.0001863127767837831,
+      "loss": 11.7038,
+      "step": 163
+    },
+    {
+      "epoch": 0.1786492374727669,
+      "grad_norm": 845.1607666015625,
+      "learning_rate": 0.0001861375403507233,
+      "loss": 11.3316,
+      "step": 164
+    },
+    {
+      "epoch": 0.17973856209150327,
+      "grad_norm": 1319.9588623046875,
+      "learning_rate": 0.00018596127277185329,
+      "loss": 12.2223,
+      "step": 165
+    },
+    {
+      "epoch": 0.18082788671023964,
+      "grad_norm": 1126.65234375,
+      "learning_rate": 0.0001857839761572586,
+      "loss": 11.2331,
+      "step": 166
+    },
+    {
+      "epoch": 0.18191721132897604,
+      "grad_norm": 827.2431640625,
+      "learning_rate": 0.00018560565262934318,
+      "loss": 11.2991,
+      "step": 167
+    },
+    {
+      "epoch": 0.1830065359477124,
+      "grad_norm": 804.45166015625,
+      "learning_rate": 0.00018542630432280422,
+      "loss": 11.1516,
+      "step": 168
+    },
+    {
+      "epoch": 0.1840958605664488,
+      "grad_norm": 868.1654052734375,
+      "learning_rate": 0.00018524593338460635,
+      "loss": 12.0214,
+      "step": 169
+    },
+    {
+      "epoch": 0.18518518518518517,
+      "grad_norm": 1219.4190673828125,
+      "learning_rate": 0.00018506454197395606,
+      "loss": 12.0915,
+      "step": 170
+    },
+    {
+      "epoch": 0.18627450980392157,
+      "grad_norm": 1242.7969970703125,
+      "learning_rate": 0.00018488213226227588,
+      "loss": 11.2175,
+      "step": 171
+    },
+    {
+      "epoch": 0.18736383442265794,
+      "grad_norm": 984.5662231445312,
+      "learning_rate": 0.0001846987064331783,
+      "loss": 11.5393,
+      "step": 172
+    },
+    {
+      "epoch": 0.18845315904139434,
+      "grad_norm": 1010.8961181640625,
+      "learning_rate": 0.00018451426668243963,
+      "loss": 11.7417,
+      "step": 173
+    },
+    {
+      "epoch": 0.1895424836601307,
+      "grad_norm": 784.3214721679688,
+      "learning_rate": 0.0001843288152179739,
+      "loss": 11.8699,
+      "step": 174
+    },
+    {
+      "epoch": 0.1906318082788671,
+      "grad_norm": 750.0485229492188,
+      "learning_rate": 0.00018414235425980616,
+      "loss": 10.6044,
+      "step": 175
+    },
+    {
+      "epoch": 0.19172113289760348,
+      "grad_norm": 1271.091064453125,
+      "learning_rate": 0.00018395488604004603,
+      "loss": 10.5923,
+      "step": 176
+    },
+    {
+      "epoch": 0.19281045751633988,
+      "grad_norm": 778.1675415039062,
+      "learning_rate": 0.00018376641280286107,
+      "loss": 11.4075,
+      "step": 177
+    },
+    {
+      "epoch": 0.19389978213507625,
+      "grad_norm": 999.4036254882812,
+      "learning_rate": 0.00018357693680444976,
+      "loss": 11.6131,
+      "step": 178
+    },
+    {
+      "epoch": 0.19498910675381265,
+      "grad_norm": 905.7958374023438,
+      "learning_rate": 0.00018338646031301458,
+      "loss": 10.323,
+      "step": 179
+    },
+    {
+      "epoch": 0.19607843137254902,
+      "grad_norm": 872.347412109375,
+      "learning_rate": 0.00018319498560873476,
+      "loss": 11.0557,
+      "step": 180
+    },
+    {
+      "epoch": 0.19716775599128541,
+      "grad_norm": 756.1710815429688,
+      "learning_rate": 0.00018300251498373923,
+      "loss": 10.2497,
+      "step": 181
+    },
+    {
+      "epoch": 0.19825708061002179,
+      "grad_norm": 845.9336547851562,
+      "learning_rate": 0.00018280905074207884,
+      "loss": 10.9945,
+      "step": 182
+    },
+    {
+      "epoch": 0.19934640522875818,
+      "grad_norm": 694.05224609375,
+      "learning_rate": 0.000182614595199699,
+      "loss": 10.545,
+      "step": 183
+    },
+    {
+      "epoch": 0.20043572984749455,
+      "grad_norm": 714.1306762695312,
+      "learning_rate": 0.00018241915068441196,
+      "loss": 11.1389,
+      "step": 184
+    },
+    {
+      "epoch": 0.20152505446623092,
+      "grad_norm": 722.7552490234375,
+      "learning_rate": 0.00018222271953586883,
+      "loss": 11.0365,
+      "step": 185
+    },
+    {
+      "epoch": 0.20261437908496732,
+      "grad_norm": 708.6851196289062,
+      "learning_rate": 0.00018202530410553163,
+      "loss": 11.2533,
+      "step": 186
+    },
+    {
+      "epoch": 0.2037037037037037,
+      "grad_norm": 1047.9068603515625,
+      "learning_rate": 0.00018182690675664514,
+      "loss": 10.9609,
+      "step": 187
+    },
+    {
+      "epoch": 0.2047930283224401,
+      "grad_norm": 562.74072265625,
+      "learning_rate": 0.00018162752986420868,
+      "loss": 11.5543,
+      "step": 188
+    },
+    {
+      "epoch": 0.20588235294117646,
+      "grad_norm": 690.9712524414062,
+      "learning_rate": 0.0001814271758149475,
+      "loss": 10.2702,
+      "step": 189
+    },
+    {
+      "epoch": 0.20697167755991286,
+      "grad_norm": 1054.808349609375,
+      "learning_rate": 0.00018122584700728443,
+      "loss": 11.0848,
+      "step": 190
+    },
+    {
+      "epoch": 0.20806100217864923,
+      "grad_norm": 633.4119262695312,
+      "learning_rate": 0.00018102354585131092,
+      "loss": 11.1002,
+      "step": 191
+    },
+    {
+      "epoch": 0.20915032679738563,
+      "grad_norm": 508.4339294433594,
+      "learning_rate": 0.00018082027476875847,
+      "loss": 10.9004,
+      "step": 192
+    },
+    {
+      "epoch": 0.210239651416122,
+      "grad_norm": 616.0364379882812,
+      "learning_rate": 0.00018061603619296942,
+      "loss": 10.9211,
+      "step": 193
+    },
+    {
+      "epoch": 0.2113289760348584,
+      "grad_norm": 553.75,
+      "learning_rate": 0.0001804108325688679,
+      "loss": 10.5755,
+      "step": 194
+    },
+    {
+      "epoch": 0.21241830065359477,
+      "grad_norm": 554.069580078125,
+      "learning_rate": 0.00018020466635293057,
+      "loss": 11.7447,
+      "step": 195
+    },
+    {
+      "epoch": 0.21350762527233116,
+      "grad_norm": 681.5348510742188,
+      "learning_rate": 0.0001799975400131572,
+      "loss": 11.0779,
+      "step": 196
+    },
+    {
+      "epoch": 0.21459694989106753,
+      "grad_norm": 632.8909301757812,
+      "learning_rate": 0.00017978945602904116,
+      "loss": 11.4297,
+      "step": 197
+    },
+    {
+      "epoch": 0.21568627450980393,
+      "grad_norm": 485.4712829589844,
+      "learning_rate": 0.0001795804168915396,
+      "loss": 9.903,
+      "step": 198
+    },
+    {
+      "epoch": 0.2167755991285403,
+      "grad_norm": 668.1004638671875,
+      "learning_rate": 0.00017937042510304392,
+      "loss": 11.2272,
+      "step": 199
+    },
+    {
+      "epoch": 0.2178649237472767,
+      "grad_norm": 798.0782470703125,
+      "learning_rate": 0.00017915948317734942,
+      "loss": 10.4403,
+      "step": 200
+    },
+    {
+      "epoch": 0.21895424836601307,
+      "grad_norm": 484.9952697753906,
+      "learning_rate": 0.00017894759363962554,
+      "loss": 11.3431,
+      "step": 201
+    },
+    {
+      "epoch": 0.22004357298474944,
+      "grad_norm": 557.5073852539062,
+      "learning_rate": 0.00017873475902638553,
+      "loss": 10.0637,
+      "step": 202
+    },
+    {
+      "epoch": 0.22113289760348584,
+      "grad_norm": 457.8398742675781,
+      "learning_rate": 0.00017852098188545602,
+      "loss": 10.7078,
+      "step": 203
+    },
+    {
+      "epoch": 0.2222222222222222,
+      "grad_norm": 1345.8021240234375,
+      "learning_rate": 0.00017830626477594654,
+      "loss": 10.8073,
+      "step": 204
+    },
+    {
+      "epoch": 0.2233115468409586,
+      "grad_norm": 508.4226379394531,
+      "learning_rate": 0.00017809061026821896,
+      "loss": 11.756,
+      "step": 205
+    },
+    {
+      "epoch": 0.22440087145969498,
+      "grad_norm": 543.2045288085938,
+      "learning_rate": 0.00017787402094385666,
+      "loss": 11.3398,
+      "step": 206
+    },
+    {
+      "epoch": 0.22549019607843138,
+      "grad_norm": 600.871337890625,
+      "learning_rate": 0.00017765649939563365,
+      "loss": 10.9323,
+      "step": 207
+    },
+    {
+      "epoch": 0.22657952069716775,
+      "grad_norm": 491.7249450683594,
+      "learning_rate": 0.00017743804822748345,
+      "loss": 11.4056,
+      "step": 208
+    },
+    {
+      "epoch": 0.22766884531590414,
+      "grad_norm": 584.7307739257812,
+      "learning_rate": 0.00017721867005446806,
+      "loss": 10.6625,
+      "step": 209
+    },
+    {
+      "epoch": 0.22875816993464052,
+      "grad_norm": 781.3718872070312,
+      "learning_rate": 0.00017699836750274662,
+      "loss": 11.4948,
+      "step": 210
+    },
+    {
+      "epoch": 0.2298474945533769,
+      "grad_norm": 561.0650024414062,
+      "learning_rate": 0.00017677714320954378,
+      "loss": 11.5926,
+      "step": 211
+    },
+    {
+      "epoch": 0.23093681917211328,
+      "grad_norm": 662.3781127929688,
+      "learning_rate": 0.00017655499982311847,
+      "loss": 11.2635,
+      "step": 212
+    },
+    {
+      "epoch": 0.23202614379084968,
+      "grad_norm": 485.6428527832031,
+      "learning_rate": 0.00017633194000273188,
+      "loss": 11.3912,
+      "step": 213
+    },
+    {
+      "epoch": 0.23311546840958605,
+      "grad_norm": 1000.7589111328125,
+      "learning_rate": 0.00017610796641861581,
+      "loss": 11.0748,
+      "step": 214
+    },
+    {
+      "epoch": 0.23420479302832245,
+      "grad_norm": 546.7896728515625,
+      "learning_rate": 0.0001758830817519407,
+      "loss": 11.1538,
+      "step": 215
+    },
+    {
+      "epoch": 0.23529411764705882,
+      "grad_norm": 498.8612976074219,
+      "learning_rate": 0.00017565728869478337,
+      "loss": 10.7306,
+      "step": 216
+    },
+    {
+      "epoch": 0.23638344226579522,
+      "grad_norm": 954.06787109375,
+      "learning_rate": 0.00017543058995009503,
+      "loss": 9.8866,
+      "step": 217
+    },
+    {
+      "epoch": 0.2374727668845316,
+      "grad_norm": 474.430419921875,
+      "learning_rate": 0.00017520298823166873,
+      "loss": 11.1717,
+      "step": 218
+    },
+    {
+      "epoch": 0.238562091503268,
+      "grad_norm": 446.2909851074219,
+      "learning_rate": 0.000174974486264107,
+      "loss": 11.0507,
+      "step": 219
+    },
+    {
+      "epoch": 0.23965141612200436,
+      "grad_norm": 574.3330688476562,
+      "learning_rate": 0.00017474508678278915,
+      "loss": 10.7534,
+      "step": 220
+    },
+    {
+      "epoch": 0.24074074074074073,
+      "grad_norm": 751.9092407226562,
+      "learning_rate": 0.00017451479253383857,
+      "loss": 11.041,
+      "step": 221
+    },
+    {
+      "epoch": 0.24183006535947713,
+      "grad_norm": 969.604248046875,
+      "learning_rate": 0.00017428360627408978,
+      "loss": 10.7288,
+      "step": 222
+    },
+    {
+      "epoch": 0.2429193899782135,
+      "grad_norm": 716.7567138671875,
+      "learning_rate": 0.0001740515307710557,
+      "loss": 11.062,
+      "step": 223
+    },
+    {
+      "epoch": 0.2440087145969499,
+      "grad_norm": 580.2841796875,
+      "learning_rate": 0.000173818568802894,
+      "loss": 10.2888,
+      "step": 224
+    },
+    {
+      "epoch": 0.24509803921568626,
+      "grad_norm": 488.0480041503906,
+      "learning_rate": 0.00017358472315837447,
+      "loss": 10.0662,
+      "step": 225
+    },
+    {
+      "epoch": 0.24618736383442266,
+      "grad_norm": 805.732666015625,
+      "learning_rate": 0.00017334999663684504,
+      "loss": 10.4939,
+      "step": 226
+    },
+    {
+      "epoch": 0.24727668845315903,
+      "grad_norm": 610.5254516601562,
+      "learning_rate": 0.00017311439204819874,
+      "loss": 10.3649,
+      "step": 227
+    },
+    {
+      "epoch": 0.24836601307189543,
+      "grad_norm": 464.3783264160156,
+      "learning_rate": 0.00017287791221283984,
+      "loss": 10.8539,
+      "step": 228
+    },
+    {
+      "epoch": 0.2494553376906318,
+      "grad_norm": 645.302001953125,
+      "learning_rate": 0.00017264055996165007,
+      "loss": 10.8646,
+      "step": 229
+    },
+    {
+      "epoch": 0.25054466230936817,
+      "grad_norm": 466.3519287109375,
+      "learning_rate": 0.00017240233813595478,
+      "loss": 10.1154,
+      "step": 230
+    },
+    {
+      "epoch": 0.25054466230936817,
+      "eval_loss": 2.678065061569214,
+      "eval_runtime": 2.7886,
+      "eval_samples_per_second": 138.78,
+      "eval_steps_per_second": 69.569,
+      "step": 230
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 918,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 230,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 781772594872320.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8f9bca55aa40a02342c255df5ba8ec8ce569e9421e57dce7457e28a6bc79c8b8
+size 6776