Training in progress, epoch 0, checkpoint

Browse files

Files changed (11) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +31 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +223 -0
last-checkpoint/trainer_state.json +633 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: EleutherAI/pythia-14m
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "EleutherAI/pythia-14m",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query_key_value",
+    "dense",
+    "dense_h_to_4h",
+    "dense_4h_to_h"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c029802c46cbe06fc545a5ac268f1fe127020bba0a0c92e6530778d907b58eb9
+size 399632

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7304122136ffcdbaab3aceeb88430711a25377f3a2f812923cd0b900799e0469
+size 531194

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dbaca5f84cc13a5d359244e121c39e7034b77cdb24d0bddf76885f4ad0de58ae
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:705cabf5cbc3a6ab0feb67c77b9b453d59efcc939ce90d310af96e621810f990
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,223 @@

+{
+  "add_bos_token": false,
+  "add_eos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<|padding|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50254": {
+      "content": "                        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50255": {
+      "content": "                       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50256": {
+      "content": "                      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50257": {
+      "content": "                     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50258": {
+      "content": "                    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50259": {
+      "content": "                   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50260": {
+      "content": "                  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50261": {
+      "content": "                 ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50262": {
+      "content": "                ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50263": {
+      "content": "               ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50264": {
+      "content": "              ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50265": {
+      "content": "             ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50266": {
+      "content": "            ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50267": {
+      "content": "           ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50268": {
+      "content": "          ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50269": {
+      "content": "         ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50270": {
+      "content": "        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50271": {
+      "content": "       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50272": {
+      "content": "      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50273": {
+      "content": "     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50274": {
+      "content": "    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50275": {
+      "content": "   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50276": {
+      "content": "  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50277": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "GPTNeoXTokenizer",
+  "unk_token": "<|endoftext|>"
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,633 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.1037344398340249,
+  "eval_steps": 100,
+  "global_step": 400,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.00025933609958506224,
+      "eval_loss": 8.698531150817871,
+      "eval_runtime": 8.1428,
+      "eval_samples_per_second": 199.44,
+      "eval_steps_per_second": 99.72,
+      "step": 1
+    },
+    {
+      "epoch": 0.0012966804979253112,
+      "grad_norm": 7155.05224609375,
+      "learning_rate": 1.6666666666666667e-05,
+      "loss": 33.9303,
+      "step": 5
+    },
+    {
+      "epoch": 0.0025933609958506223,
+      "grad_norm": 7518.1064453125,
+      "learning_rate": 3.3333333333333335e-05,
+      "loss": 29.7556,
+      "step": 10
+    },
+    {
+      "epoch": 0.0038900414937759337,
+      "grad_norm": 4561.16015625,
+      "learning_rate": 5e-05,
+      "loss": 32.4819,
+      "step": 15
+    },
+    {
+      "epoch": 0.005186721991701245,
+      "grad_norm": 13537.517578125,
+      "learning_rate": 6.666666666666667e-05,
+      "loss": 32.4006,
+      "step": 20
+    },
+    {
+      "epoch": 0.006483402489626556,
+      "grad_norm": 4978.63720703125,
+      "learning_rate": 8.333333333333334e-05,
+      "loss": 31.8345,
+      "step": 25
+    },
+    {
+      "epoch": 0.007780082987551867,
+      "grad_norm": 6110.34619140625,
+      "learning_rate": 0.0001,
+      "loss": 33.2257,
+      "step": 30
+    },
+    {
+      "epoch": 0.00907676348547718,
+      "grad_norm": 5287.54248046875,
+      "learning_rate": 9.995494831023409e-05,
+      "loss": 33.1108,
+      "step": 35
+    },
+    {
+      "epoch": 0.01037344398340249,
+      "grad_norm": 7983.4345703125,
+      "learning_rate": 9.981987442712633e-05,
+      "loss": 36.1986,
+      "step": 40
+    },
+    {
+      "epoch": 0.011670124481327801,
+      "grad_norm": 4800.20849609375,
+      "learning_rate": 9.959502176294383e-05,
+      "loss": 36.5164,
+      "step": 45
+    },
+    {
+      "epoch": 0.012966804979253113,
+      "grad_norm": 4833.9091796875,
+      "learning_rate": 9.928079551738543e-05,
+      "loss": 37.5867,
+      "step": 50
+    },
+    {
+      "epoch": 0.014263485477178423,
+      "grad_norm": 8798.251953125,
+      "learning_rate": 9.887776194738432e-05,
+      "loss": 32.1512,
+      "step": 55
+    },
+    {
+      "epoch": 0.015560165975103735,
+      "grad_norm": 7883.82080078125,
+      "learning_rate": 9.838664734667495e-05,
+      "loss": 31.345,
+      "step": 60
+    },
+    {
+      "epoch": 0.016856846473029045,
+      "grad_norm": 13352.6337890625,
+      "learning_rate": 9.780833673696254e-05,
+      "loss": 30.661,
+      "step": 65
+    },
+    {
+      "epoch": 0.01815352697095436,
+      "grad_norm": 9907.5986328125,
+      "learning_rate": 9.714387227305422e-05,
+      "loss": 31.419,
+      "step": 70
+    },
+    {
+      "epoch": 0.01945020746887967,
+      "grad_norm": 9755.0712890625,
+      "learning_rate": 9.639445136482548e-05,
+      "loss": 29.7253,
+      "step": 75
+    },
+    {
+      "epoch": 0.02074688796680498,
+      "grad_norm": 5575.86669921875,
+      "learning_rate": 9.55614245194068e-05,
+      "loss": 30.6193,
+      "step": 80
+    },
+    {
+      "epoch": 0.022043568464730292,
+      "grad_norm": 10032.1767578125,
+      "learning_rate": 9.464629290747842e-05,
+      "loss": 31.2725,
+      "step": 85
+    },
+    {
+      "epoch": 0.023340248962655602,
+      "grad_norm": 7126.8974609375,
+      "learning_rate": 9.365070565805941e-05,
+      "loss": 32.5974,
+      "step": 90
+    },
+    {
+      "epoch": 0.024636929460580912,
+      "grad_norm": 13018.7900390625,
+      "learning_rate": 9.257645688666556e-05,
+      "loss": 36.0568,
+      "step": 95
+    },
+    {
+      "epoch": 0.025933609958506226,
+      "grad_norm": 8943.1474609375,
+      "learning_rate": 9.142548246219212e-05,
+      "loss": 35.993,
+      "step": 100
+    },
+    {
+      "epoch": 0.025933609958506226,
+      "eval_loss": 8.28997802734375,
+      "eval_runtime": 8.1451,
+      "eval_samples_per_second": 199.385,
+      "eval_steps_per_second": 99.692,
+      "step": 100
+    },
+    {
+      "epoch": 0.027230290456431536,
+      "grad_norm": 10046.58203125,
+      "learning_rate": 9.019985651834703e-05,
+      "loss": 33.3247,
+      "step": 105
+    },
+    {
+      "epoch": 0.028526970954356846,
+      "grad_norm": 10348.04296875,
+      "learning_rate": 8.890178771592199e-05,
+      "loss": 32.6935,
+      "step": 110
+    },
+    {
+      "epoch": 0.02982365145228216,
+      "grad_norm": 9010.048828125,
+      "learning_rate": 8.753361526263621e-05,
+      "loss": 32.3948,
+      "step": 115
+    },
+    {
+      "epoch": 0.03112033195020747,
+      "grad_norm": 14601.9599609375,
+      "learning_rate": 8.609780469772623e-05,
+      "loss": 33.8764,
+      "step": 120
+    },
+    {
+      "epoch": 0.03241701244813278,
+      "grad_norm": 10465.0078125,
+      "learning_rate": 8.459694344887732e-05,
+      "loss": 34.3072,
+      "step": 125
+    },
+    {
+      "epoch": 0.03371369294605809,
+      "grad_norm": 8753.7177734375,
+      "learning_rate": 8.303373616950408e-05,
+      "loss": 33.7072,
+      "step": 130
+    },
+    {
+      "epoch": 0.0350103734439834,
+      "grad_norm": 12666.2978515625,
+      "learning_rate": 8.141099986478212e-05,
+      "loss": 34.4488,
+      "step": 135
+    },
+    {
+      "epoch": 0.03630705394190872,
+      "grad_norm": 5840.59716796875,
+      "learning_rate": 7.973165881521434e-05,
+      "loss": 35.1657,
+      "step": 140
+    },
+    {
+      "epoch": 0.03760373443983402,
+      "grad_norm": 6317.65771484375,
+      "learning_rate": 7.799873930687978e-05,
+      "loss": 37.1777,
+      "step": 145
+    },
+    {
+      "epoch": 0.03890041493775934,
+      "grad_norm": 4859.33447265625,
+      "learning_rate": 7.621536417786159e-05,
+      "loss": 37.6113,
+      "step": 150
+    },
+    {
+      "epoch": 0.04019709543568465,
+      "grad_norm": 16081.6484375,
+      "learning_rate": 7.438474719068173e-05,
+      "loss": 37.0722,
+      "step": 155
+    },
+    {
+      "epoch": 0.04149377593360996,
+      "grad_norm": 26484.927734375,
+      "learning_rate": 7.251018724088367e-05,
+      "loss": 34.6445,
+      "step": 160
+    },
+    {
+      "epoch": 0.04279045643153527,
+      "grad_norm": 33124.15625,
+      "learning_rate": 7.059506241219965e-05,
+      "loss": 34.0614,
+      "step": 165
+    },
+    {
+      "epoch": 0.044087136929460584,
+      "grad_norm": 10712.5439453125,
+      "learning_rate": 6.864282388901544e-05,
+      "loss": 35.3515,
+      "step": 170
+    },
+    {
+      "epoch": 0.04538381742738589,
+      "grad_norm": 9869.5673828125,
+      "learning_rate": 6.665698973710288e-05,
+      "loss": 35.5691,
+      "step": 175
+    },
+    {
+      "epoch": 0.046680497925311204,
+      "grad_norm": 13349.5517578125,
+      "learning_rate": 6.464113856382752e-05,
+      "loss": 34.8757,
+      "step": 180
+    },
+    {
+      "epoch": 0.04797717842323652,
+      "grad_norm": 15789.9609375,
+      "learning_rate": 6.259890306925627e-05,
+      "loss": 34.0789,
+      "step": 185
+    },
+    {
+      "epoch": 0.049273858921161824,
+      "grad_norm": 6027.7412109375,
+      "learning_rate": 6.0533963499786314e-05,
+      "loss": 34.8986,
+      "step": 190
+    },
+    {
+      "epoch": 0.05057053941908714,
+      "grad_norm": 5119.00439453125,
+      "learning_rate": 5.8450041016092464e-05,
+      "loss": 35.5115,
+      "step": 195
+    },
+    {
+      "epoch": 0.05186721991701245,
+      "grad_norm": 6608.8369140625,
+      "learning_rate": 5.6350890987343944e-05,
+      "loss": 36.5823,
+      "step": 200
+    },
+    {
+      "epoch": 0.05186721991701245,
+      "eval_loss": 8.631919860839844,
+      "eval_runtime": 8.1423,
+      "eval_samples_per_second": 199.452,
+      "eval_steps_per_second": 99.726,
+      "step": 200
+    },
+    {
+      "epoch": 0.05316390041493776,
+      "grad_norm": 13360.7353515625,
+      "learning_rate": 5.4240296223775465e-05,
+      "loss": 34.7144,
+      "step": 205
+    },
+    {
+      "epoch": 0.05446058091286307,
+      "grad_norm": 13035.0859375,
+      "learning_rate": 5.212206015980742e-05,
+      "loss": 34.5991,
+      "step": 210
+    },
+    {
+      "epoch": 0.055757261410788385,
+      "grad_norm": 7893.5126953125,
+      "learning_rate": 5e-05,
+      "loss": 32.2267,
+      "step": 215
+    },
+    {
+      "epoch": 0.05705394190871369,
+      "grad_norm": 9001.4267578125,
+      "learning_rate": 4.78779398401926e-05,
+      "loss": 34.6116,
+      "step": 220
+    },
+    {
+      "epoch": 0.058350622406639005,
+      "grad_norm": 10490.74609375,
+      "learning_rate": 4.575970377622456e-05,
+      "loss": 32.5425,
+      "step": 225
+    },
+    {
+      "epoch": 0.05964730290456432,
+      "grad_norm": 10166.5146484375,
+      "learning_rate": 4.364910901265606e-05,
+      "loss": 33.2318,
+      "step": 230
+    },
+    {
+      "epoch": 0.060943983402489625,
+      "grad_norm": 11017.513671875,
+      "learning_rate": 4.1549958983907555e-05,
+      "loss": 33.5466,
+      "step": 235
+    },
+    {
+      "epoch": 0.06224066390041494,
+      "grad_norm": 7977.8095703125,
+      "learning_rate": 3.94660365002137e-05,
+      "loss": 35.1499,
+      "step": 240
+    },
+    {
+      "epoch": 0.06353734439834025,
+      "grad_norm": 10266.302734375,
+      "learning_rate": 3.740109693074375e-05,
+      "loss": 35.661,
+      "step": 245
+    },
+    {
+      "epoch": 0.06483402489626557,
+      "grad_norm": 14391.1142578125,
+      "learning_rate": 3.5358861436172485e-05,
+      "loss": 34.3488,
+      "step": 250
+    },
+    {
+      "epoch": 0.06613070539419087,
+      "grad_norm": 20267.6484375,
+      "learning_rate": 3.334301026289712e-05,
+      "loss": 34.2542,
+      "step": 255
+    },
+    {
+      "epoch": 0.06742738589211618,
+      "grad_norm": 8814.4384765625,
+      "learning_rate": 3.135717611098458e-05,
+      "loss": 33.9238,
+      "step": 260
+    },
+    {
+      "epoch": 0.0687240663900415,
+      "grad_norm": 8733.9638671875,
+      "learning_rate": 2.9404937587800375e-05,
+      "loss": 32.618,
+      "step": 265
+    },
+    {
+      "epoch": 0.0700207468879668,
+      "grad_norm": 8552.453125,
+      "learning_rate": 2.748981275911633e-05,
+      "loss": 34.2621,
+      "step": 270
+    },
+    {
+      "epoch": 0.07131742738589211,
+      "grad_norm": 10575.3955078125,
+      "learning_rate": 2.5615252809318284e-05,
+      "loss": 33.1339,
+      "step": 275
+    },
+    {
+      "epoch": 0.07261410788381743,
+      "grad_norm": 8443.8935546875,
+      "learning_rate": 2.3784635822138424e-05,
+      "loss": 35.1507,
+      "step": 280
+    },
+    {
+      "epoch": 0.07391078838174274,
+      "grad_norm": 7032.55810546875,
+      "learning_rate": 2.2001260693120233e-05,
+      "loss": 33.4609,
+      "step": 285
+    },
+    {
+      "epoch": 0.07520746887966805,
+      "grad_norm": 7061.39599609375,
+      "learning_rate": 2.026834118478567e-05,
+      "loss": 35.2882,
+      "step": 290
+    },
+    {
+      "epoch": 0.07650414937759337,
+      "grad_norm": 5346.2392578125,
+      "learning_rate": 1.858900013521788e-05,
+      "loss": 35.1677,
+      "step": 295
+    },
+    {
+      "epoch": 0.07780082987551867,
+      "grad_norm": 11053.076171875,
+      "learning_rate": 1.6966263830495936e-05,
+      "loss": 36.5602,
+      "step": 300
+    },
+    {
+      "epoch": 0.07780082987551867,
+      "eval_loss": 8.695049285888672,
+      "eval_runtime": 8.1704,
+      "eval_samples_per_second": 198.766,
+      "eval_steps_per_second": 99.383,
+      "step": 300
+    },
+    {
+      "epoch": 0.07909751037344398,
+      "grad_norm": 30041.283203125,
+      "learning_rate": 1.5403056551122697e-05,
+      "loss": 35.9917,
+      "step": 305
+    },
+    {
+      "epoch": 0.0803941908713693,
+      "grad_norm": 9423.1142578125,
+      "learning_rate": 1.3902195302273779e-05,
+      "loss": 35.2563,
+      "step": 310
+    },
+    {
+      "epoch": 0.08169087136929461,
+      "grad_norm": 20269.171875,
+      "learning_rate": 1.246638473736378e-05,
+      "loss": 35.0311,
+      "step": 315
+    },
+    {
+      "epoch": 0.08298755186721991,
+      "grad_norm": 22867.439453125,
+      "learning_rate": 1.1098212284078036e-05,
+      "loss": 34.2222,
+      "step": 320
+    },
+    {
+      "epoch": 0.08428423236514523,
+      "grad_norm": 10219.056640625,
+      "learning_rate": 9.800143481652979e-06,
+      "loss": 34.0283,
+      "step": 325
+    },
+    {
+      "epoch": 0.08558091286307054,
+      "grad_norm": 8089.09130859375,
+      "learning_rate": 8.574517537807897e-06,
+      "loss": 34.1538,
+      "step": 330
+    },
+    {
+      "epoch": 0.08687759336099585,
+      "grad_norm": 9407.201171875,
+      "learning_rate": 7.423543113334436e-06,
+      "loss": 33.1668,
+      "step": 335
+    },
+    {
+      "epoch": 0.08817427385892117,
+      "grad_norm": 10601.11328125,
+      "learning_rate": 6.349294341940593e-06,
+      "loss": 33.7767,
+      "step": 340
+    },
+    {
+      "epoch": 0.08947095435684647,
+      "grad_norm": 4801.73486328125,
+      "learning_rate": 5.353707092521582e-06,
+      "loss": 35.8119,
+      "step": 345
+    },
+    {
+      "epoch": 0.09076763485477178,
+      "grad_norm": 12896.2568359375,
+      "learning_rate": 4.43857548059321e-06,
+      "loss": 34.4992,
+      "step": 350
+    },
+    {
+      "epoch": 0.0920643153526971,
+      "grad_norm": 10644.0185546875,
+      "learning_rate": 3.605548635174533e-06,
+      "loss": 35.3011,
+      "step": 355
+    },
+    {
+      "epoch": 0.09336099585062241,
+      "grad_norm": 14706.4345703125,
+      "learning_rate": 2.85612772694579e-06,
+      "loss": 34.469,
+      "step": 360
+    },
+    {
+      "epoch": 0.09465767634854771,
+      "grad_norm": 9028.2314453125,
+      "learning_rate": 2.191663263037458e-06,
+      "loss": 33.8728,
+      "step": 365
+    },
+    {
+      "epoch": 0.09595435684647304,
+      "grad_norm": 8670.654296875,
+      "learning_rate": 1.6133526533250565e-06,
+      "loss": 34.1806,
+      "step": 370
+    },
+    {
+      "epoch": 0.09725103734439834,
+      "grad_norm": 16775.578125,
+      "learning_rate": 1.1222380526156928e-06,
+      "loss": 34.9482,
+      "step": 375
+    },
+    {
+      "epoch": 0.09854771784232365,
+      "grad_norm": 8365.63671875,
+      "learning_rate": 7.192044826145771e-07,
+      "loss": 33.2813,
+      "step": 380
+    },
+    {
+      "epoch": 0.09984439834024897,
+      "grad_norm": 12337.71875,
+      "learning_rate": 4.049782370561583e-07,
+      "loss": 33.6363,
+      "step": 385
+    },
+    {
+      "epoch": 0.10114107883817428,
+      "grad_norm": 8647.94140625,
+      "learning_rate": 1.8012557287367392e-07,
+      "loss": 34.9097,
+      "step": 390
+    },
+    {
+      "epoch": 0.10243775933609958,
+      "grad_norm": 7914.73193359375,
+      "learning_rate": 4.5051689765929214e-08,
+      "loss": 36.425,
+      "step": 395
+    },
+    {
+      "epoch": 0.1037344398340249,
+      "grad_norm": 24286.349609375,
+      "learning_rate": 0.0,
+      "loss": 36.3768,
+      "step": 400
+    },
+    {
+      "epoch": 0.1037344398340249,
+      "eval_loss": 8.702563285827637,
+      "eval_runtime": 8.0771,
+      "eval_samples_per_second": 201.063,
+      "eval_steps_per_second": 100.531,
+      "step": 400
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 400,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 155054417903616.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:503107bd476d33bc18092aecbb28a9aa0708e12f5748b3f453d043b13c7074da
+size 6776