Spaces:

rudranighosh
/

family_law_finetuned

Running

App Files Files Community

rudranighosh commited on May 21

Commit

f8f915f

verified ·

1 Parent(s): c5b561d

Upload 26 files

Browse files

Files changed (26) hide show

tinyllama-lora-finetuned/checkpoint-4706/README.md +202 -0
tinyllama-lora-finetuned/checkpoint-4706/adapter_config.json +34 -0
tinyllama-lora-finetuned/checkpoint-4706/adapter_model.safetensors +3 -0
tinyllama-lora-finetuned/checkpoint-4706/optimizer.pt +3 -0
tinyllama-lora-finetuned/checkpoint-4706/rng_state.pth +3 -0
tinyllama-lora-finetuned/checkpoint-4706/scaler.pt +3 -0
tinyllama-lora-finetuned/checkpoint-4706/scheduler.pt +3 -0
tinyllama-lora-finetuned/checkpoint-4706/special_tokens_map.json +24 -0
tinyllama-lora-finetuned/checkpoint-4706/tokenizer.json +0 -0
tinyllama-lora-finetuned/checkpoint-4706/tokenizer.model +3 -0
tinyllama-lora-finetuned/checkpoint-4706/tokenizer_config.json +44 -0
tinyllama-lora-finetuned/checkpoint-4706/trainer_state.json +1679 -0
tinyllama-lora-finetuned/checkpoint-4706/training_args.bin +3 -0
tinyllama-lora-finetuned/checkpoint-7059/README.md +202 -0
tinyllama-lora-finetuned/checkpoint-7059/adapter_config.json +34 -0
tinyllama-lora-finetuned/checkpoint-7059/adapter_model.safetensors +3 -0
tinyllama-lora-finetuned/checkpoint-7059/optimizer.pt +3 -0
tinyllama-lora-finetuned/checkpoint-7059/rng_state.pth +3 -0
tinyllama-lora-finetuned/checkpoint-7059/scaler.pt +3 -0
tinyllama-lora-finetuned/checkpoint-7059/scheduler.pt +3 -0
tinyllama-lora-finetuned/checkpoint-7059/special_tokens_map.json +24 -0
tinyllama-lora-finetuned/checkpoint-7059/tokenizer.json +0 -0
tinyllama-lora-finetuned/checkpoint-7059/tokenizer.model +3 -0
tinyllama-lora-finetuned/checkpoint-7059/tokenizer_config.json +44 -0
tinyllama-lora-finetuned/checkpoint-7059/trainer_state.json +2498 -0
tinyllama-lora-finetuned/checkpoint-7059/training_args.bin +3 -0

tinyllama-lora-finetuned/checkpoint-4706/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.2

tinyllama-lora-finetuned/checkpoint-4706/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

tinyllama-lora-finetuned/checkpoint-4706/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5a03f9fe5a795b8ddff1fb2226e37eb7df60c2dac17ea99e0f1a9985584d57c5
+size 9022864

tinyllama-lora-finetuned/checkpoint-4706/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cefc93a39af6182f78f286b71170117c179e59605b663ab45c48f65d2c22e9a5
+size 18096570

tinyllama-lora-finetuned/checkpoint-4706/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c584307684d968aca52148948545374b3367dba87f3fcf85395b87b488dd9bcf
+size 14244

tinyllama-lora-finetuned/checkpoint-4706/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f0e57fc822cae603542453ce834ab7d91378cdff04ad8bd2c3e5ec87bd5f66dc
+size 988

tinyllama-lora-finetuned/checkpoint-4706/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:31c891b9346c8dcc7999087526c82dc11ff9de250c4c17471f20e568a36c5028
+size 1064

tinyllama-lora-finetuned/checkpoint-4706/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "</s>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tinyllama-lora-finetuned/checkpoint-4706/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tinyllama-lora-finetuned/checkpoint-4706/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

tinyllama-lora-finetuned/checkpoint-4706/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "legacy": false,
+  "model_max_length": 2048,
+  "pad_token": "</s>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

tinyllama-lora-finetuned/checkpoint-4706/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1679 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 2.0,
+  "eval_steps": 500,
+  "global_step": 4706,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.008499787505312367,
+      "grad_norm": 1.8467686176300049,
+      "learning_rate": 0.00019946168012466354,
+      "loss": 2.1643,
+      "step": 20
+    },
+    {
+      "epoch": 0.016999575010624733,
+      "grad_norm": 2.2261719703674316,
+      "learning_rate": 0.0001988950276243094,
+      "loss": 1.7448,
+      "step": 40
+    },
+    {
+      "epoch": 0.025499362515937103,
+      "grad_norm": 1.9097844362258911,
+      "learning_rate": 0.00019832837512395523,
+      "loss": 1.702,
+      "step": 60
+    },
+    {
+      "epoch": 0.033999150021249466,
+      "grad_norm": 1.676216721534729,
+      "learning_rate": 0.00019776172262360109,
+      "loss": 1.7414,
+      "step": 80
+    },
+    {
+      "epoch": 0.042498937526561836,
+      "grad_norm": 1.7480692863464355,
+      "learning_rate": 0.00019719507012324694,
+      "loss": 1.6,
+      "step": 100
+    },
+    {
+      "epoch": 0.050998725031874206,
+      "grad_norm": 1.603786826133728,
+      "learning_rate": 0.00019662841762289277,
+      "loss": 1.72,
+      "step": 120
+    },
+    {
+      "epoch": 0.05949851253718657,
+      "grad_norm": 1.577128529548645,
+      "learning_rate": 0.00019606176512253863,
+      "loss": 1.6804,
+      "step": 140
+    },
+    {
+      "epoch": 0.06799830004249893,
+      "grad_norm": 1.5658762454986572,
+      "learning_rate": 0.00019549511262218446,
+      "loss": 1.6582,
+      "step": 160
+    },
+    {
+      "epoch": 0.0764980875478113,
+      "grad_norm": 1.7549939155578613,
+      "learning_rate": 0.00019492846012183031,
+      "loss": 1.6238,
+      "step": 180
+    },
+    {
+      "epoch": 0.08499787505312367,
+      "grad_norm": 1.490046501159668,
+      "learning_rate": 0.00019436180762147614,
+      "loss": 1.6661,
+      "step": 200
+    },
+    {
+      "epoch": 0.09349766255843604,
+      "grad_norm": 1.7966225147247314,
+      "learning_rate": 0.00019379515512112197,
+      "loss": 1.6258,
+      "step": 220
+    },
+    {
+      "epoch": 0.10199745006374841,
+      "grad_norm": 1.6181796789169312,
+      "learning_rate": 0.00019322850262076783,
+      "loss": 1.6042,
+      "step": 240
+    },
+    {
+      "epoch": 0.11049723756906077,
+      "grad_norm": 1.4492865800857544,
+      "learning_rate": 0.00019266185012041366,
+      "loss": 1.5649,
+      "step": 260
+    },
+    {
+      "epoch": 0.11899702507437314,
+      "grad_norm": 2.01737117767334,
+      "learning_rate": 0.0001920951976200595,
+      "loss": 1.6877,
+      "step": 280
+    },
+    {
+      "epoch": 0.1274968125796855,
+      "grad_norm": 1.4284173250198364,
+      "learning_rate": 0.00019152854511970534,
+      "loss": 1.5727,
+      "step": 300
+    },
+    {
+      "epoch": 0.13599660008499787,
+      "grad_norm": 1.4602673053741455,
+      "learning_rate": 0.0001909618926193512,
+      "loss": 1.5837,
+      "step": 320
+    },
+    {
+      "epoch": 0.14449638759031025,
+      "grad_norm": 1.6680593490600586,
+      "learning_rate": 0.00019039524011899703,
+      "loss": 1.6122,
+      "step": 340
+    },
+    {
+      "epoch": 0.1529961750956226,
+      "grad_norm": 1.4604177474975586,
+      "learning_rate": 0.00018982858761864288,
+      "loss": 1.6064,
+      "step": 360
+    },
+    {
+      "epoch": 0.161495962600935,
+      "grad_norm": 1.4658399820327759,
+      "learning_rate": 0.00018926193511828871,
+      "loss": 1.5842,
+      "step": 380
+    },
+    {
+      "epoch": 0.16999575010624735,
+      "grad_norm": 1.2307661771774292,
+      "learning_rate": 0.00018869528261793457,
+      "loss": 1.6023,
+      "step": 400
+    },
+    {
+      "epoch": 0.1784955376115597,
+      "grad_norm": 1.4648102521896362,
+      "learning_rate": 0.0001881286301175804,
+      "loss": 1.5484,
+      "step": 420
+    },
+    {
+      "epoch": 0.18699532511687209,
+      "grad_norm": 1.4780811071395874,
+      "learning_rate": 0.00018756197761722626,
+      "loss": 1.5009,
+      "step": 440
+    },
+    {
+      "epoch": 0.19549511262218444,
+      "grad_norm": 1.4852784872055054,
+      "learning_rate": 0.00018699532511687208,
+      "loss": 1.6116,
+      "step": 460
+    },
+    {
+      "epoch": 0.20399490012749683,
+      "grad_norm": 1.4338926076889038,
+      "learning_rate": 0.00018642867261651794,
+      "loss": 1.6025,
+      "step": 480
+    },
+    {
+      "epoch": 0.21249468763280918,
+      "grad_norm": 1.5596784353256226,
+      "learning_rate": 0.00018586202011616377,
+      "loss": 1.6212,
+      "step": 500
+    },
+    {
+      "epoch": 0.22099447513812154,
+      "grad_norm": 1.2840818166732788,
+      "learning_rate": 0.0001852953676158096,
+      "loss": 1.6135,
+      "step": 520
+    },
+    {
+      "epoch": 0.22949426264343392,
+      "grad_norm": 1.5443825721740723,
+      "learning_rate": 0.00018472871511545546,
+      "loss": 1.5785,
+      "step": 540
+    },
+    {
+      "epoch": 0.23799405014874628,
+      "grad_norm": 1.5090241432189941,
+      "learning_rate": 0.00018416206261510129,
+      "loss": 1.5871,
+      "step": 560
+    },
+    {
+      "epoch": 0.24649383765405866,
+      "grad_norm": 1.613731861114502,
+      "learning_rate": 0.00018359541011474714,
+      "loss": 1.604,
+      "step": 580
+    },
+    {
+      "epoch": 0.254993625159371,
+      "grad_norm": 1.354567289352417,
+      "learning_rate": 0.00018302875761439297,
+      "loss": 1.4996,
+      "step": 600
+    },
+    {
+      "epoch": 0.2634934126646834,
+      "grad_norm": 1.5025413036346436,
+      "learning_rate": 0.00018246210511403883,
+      "loss": 1.5923,
+      "step": 620
+    },
+    {
+      "epoch": 0.27199320016999573,
+      "grad_norm": 1.3449057340621948,
+      "learning_rate": 0.00018189545261368466,
+      "loss": 1.5435,
+      "step": 640
+    },
+    {
+      "epoch": 0.2804929876753081,
+      "grad_norm": 1.2846812009811401,
+      "learning_rate": 0.0001813288001133305,
+      "loss": 1.536,
+      "step": 660
+    },
+    {
+      "epoch": 0.2889927751806205,
+      "grad_norm": 1.3012146949768066,
+      "learning_rate": 0.00018076214761297634,
+      "loss": 1.539,
+      "step": 680
+    },
+    {
+      "epoch": 0.2974925626859328,
+      "grad_norm": 1.5017261505126953,
+      "learning_rate": 0.0001801954951126222,
+      "loss": 1.5727,
+      "step": 700
+    },
+    {
+      "epoch": 0.3059923501912452,
+      "grad_norm": 1.3044496774673462,
+      "learning_rate": 0.00017962884261226805,
+      "loss": 1.6619,
+      "step": 720
+    },
+    {
+      "epoch": 0.3144921376965576,
+      "grad_norm": 1.6723718643188477,
+      "learning_rate": 0.00017906219011191388,
+      "loss": 1.5572,
+      "step": 740
+    },
+    {
+      "epoch": 0.32299192520187,
+      "grad_norm": 1.5928444862365723,
+      "learning_rate": 0.00017849553761155974,
+      "loss": 1.4755,
+      "step": 760
+    },
+    {
+      "epoch": 0.3314917127071823,
+      "grad_norm": 1.2782012224197388,
+      "learning_rate": 0.00017792888511120557,
+      "loss": 1.592,
+      "step": 780
+    },
+    {
+      "epoch": 0.3399915002124947,
+      "grad_norm": 1.349767804145813,
+      "learning_rate": 0.0001773622326108514,
+      "loss": 1.5436,
+      "step": 800
+    },
+    {
+      "epoch": 0.3484912877178071,
+      "grad_norm": 1.4979254007339478,
+      "learning_rate": 0.00017679558011049723,
+      "loss": 1.6191,
+      "step": 820
+    },
+    {
+      "epoch": 0.3569910752231194,
+      "grad_norm": 1.5227205753326416,
+      "learning_rate": 0.00017622892761014308,
+      "loss": 1.6074,
+      "step": 840
+    },
+    {
+      "epoch": 0.3654908627284318,
+      "grad_norm": 1.300671100616455,
+      "learning_rate": 0.0001756622751097889,
+      "loss": 1.5877,
+      "step": 860
+    },
+    {
+      "epoch": 0.37399065023374417,
+      "grad_norm": 1.1968013048171997,
+      "learning_rate": 0.00017509562260943477,
+      "loss": 1.5343,
+      "step": 880
+    },
+    {
+      "epoch": 0.3824904377390565,
+      "grad_norm": 1.4233676195144653,
+      "learning_rate": 0.00017452897010908063,
+      "loss": 1.5093,
+      "step": 900
+    },
+    {
+      "epoch": 0.3909902252443689,
+      "grad_norm": 1.5608493089675903,
+      "learning_rate": 0.00017396231760872645,
+      "loss": 1.552,
+      "step": 920
+    },
+    {
+      "epoch": 0.39949001274968127,
+      "grad_norm": 1.6073567867279053,
+      "learning_rate": 0.0001733956651083723,
+      "loss": 1.551,
+      "step": 940
+    },
+    {
+      "epoch": 0.40798980025499365,
+      "grad_norm": 1.1513320207595825,
+      "learning_rate": 0.00017282901260801814,
+      "loss": 1.51,
+      "step": 960
+    },
+    {
+      "epoch": 0.416489587760306,
+      "grad_norm": 1.3810391426086426,
+      "learning_rate": 0.000172262360107664,
+      "loss": 1.5447,
+      "step": 980
+    },
+    {
+      "epoch": 0.42498937526561836,
+      "grad_norm": 1.4732245206832886,
+      "learning_rate": 0.00017169570760730983,
+      "loss": 1.5781,
+      "step": 1000
+    },
+    {
+      "epoch": 0.43348916277093075,
+      "grad_norm": 1.1698389053344727,
+      "learning_rate": 0.00017112905510695568,
+      "loss": 1.5205,
+      "step": 1020
+    },
+    {
+      "epoch": 0.4419889502762431,
+      "grad_norm": 1.49894118309021,
+      "learning_rate": 0.0001705624026066015,
+      "loss": 1.6071,
+      "step": 1040
+    },
+    {
+      "epoch": 0.45048873778155546,
+      "grad_norm": 1.3100991249084473,
+      "learning_rate": 0.00016999575010624734,
+      "loss": 1.6148,
+      "step": 1060
+    },
+    {
+      "epoch": 0.45898852528686784,
+      "grad_norm": 1.5139139890670776,
+      "learning_rate": 0.0001694290976058932,
+      "loss": 1.5657,
+      "step": 1080
+    },
+    {
+      "epoch": 0.46748831279218017,
+      "grad_norm": 1.3952980041503906,
+      "learning_rate": 0.00016886244510553903,
+      "loss": 1.5596,
+      "step": 1100
+    },
+    {
+      "epoch": 0.47598810029749256,
+      "grad_norm": 1.2251731157302856,
+      "learning_rate": 0.00016829579260518488,
+      "loss": 1.5207,
+      "step": 1120
+    },
+    {
+      "epoch": 0.48448788780280494,
+      "grad_norm": 1.4944117069244385,
+      "learning_rate": 0.0001677291401048307,
+      "loss": 1.562,
+      "step": 1140
+    },
+    {
+      "epoch": 0.4929876753081173,
+      "grad_norm": 1.2576520442962646,
+      "learning_rate": 0.00016716248760447657,
+      "loss": 1.5362,
+      "step": 1160
+    },
+    {
+      "epoch": 0.5014874628134297,
+      "grad_norm": 1.3012641668319702,
+      "learning_rate": 0.0001665958351041224,
+      "loss": 1.5384,
+      "step": 1180
+    },
+    {
+      "epoch": 0.509987250318742,
+      "grad_norm": 1.3370224237442017,
+      "learning_rate": 0.00016602918260376825,
+      "loss": 1.5406,
+      "step": 1200
+    },
+    {
+      "epoch": 0.5184870378240544,
+      "grad_norm": 1.4711674451828003,
+      "learning_rate": 0.00016546253010341408,
+      "loss": 1.5489,
+      "step": 1220
+    },
+    {
+      "epoch": 0.5269868253293668,
+      "grad_norm": 1.3889433145523071,
+      "learning_rate": 0.00016489587760305994,
+      "loss": 1.5683,
+      "step": 1240
+    },
+    {
+      "epoch": 0.5354866128346791,
+      "grad_norm": 1.181380271911621,
+      "learning_rate": 0.00016432922510270577,
+      "loss": 1.5282,
+      "step": 1260
+    },
+    {
+      "epoch": 0.5439864003399915,
+      "grad_norm": 1.2874456644058228,
+      "learning_rate": 0.00016376257260235162,
+      "loss": 1.5642,
+      "step": 1280
+    },
+    {
+      "epoch": 0.5524861878453039,
+      "grad_norm": 1.2181987762451172,
+      "learning_rate": 0.00016319592010199748,
+      "loss": 1.5324,
+      "step": 1300
+    },
+    {
+      "epoch": 0.5609859753506162,
+      "grad_norm": 1.4149408340454102,
+      "learning_rate": 0.0001626292676016433,
+      "loss": 1.5366,
+      "step": 1320
+    },
+    {
+      "epoch": 0.5694857628559286,
+      "grad_norm": 1.1893357038497925,
+      "learning_rate": 0.00016206261510128914,
+      "loss": 1.5386,
+      "step": 1340
+    },
+    {
+      "epoch": 0.577985550361241,
+      "grad_norm": 1.4138270616531372,
+      "learning_rate": 0.00016149596260093497,
+      "loss": 1.4789,
+      "step": 1360
+    },
+    {
+      "epoch": 0.5864853378665533,
+      "grad_norm": 1.5200152397155762,
+      "learning_rate": 0.00016092931010058082,
+      "loss": 1.5012,
+      "step": 1380
+    },
+    {
+      "epoch": 0.5949851253718657,
+      "grad_norm": 1.2447080612182617,
+      "learning_rate": 0.00016036265760022665,
+      "loss": 1.498,
+      "step": 1400
+    },
+    {
+      "epoch": 0.6034849128771781,
+      "grad_norm": 1.4135057926177979,
+      "learning_rate": 0.0001597960050998725,
+      "loss": 1.4519,
+      "step": 1420
+    },
+    {
+      "epoch": 0.6119847003824904,
+      "grad_norm": 1.2652864456176758,
+      "learning_rate": 0.00015922935259951834,
+      "loss": 1.5595,
+      "step": 1440
+    },
+    {
+      "epoch": 0.6204844878878029,
+      "grad_norm": 1.356728434562683,
+      "learning_rate": 0.0001586627000991642,
+      "loss": 1.4859,
+      "step": 1460
+    },
+    {
+      "epoch": 0.6289842753931152,
+      "grad_norm": 1.3269623517990112,
+      "learning_rate": 0.00015809604759881002,
+      "loss": 1.5002,
+      "step": 1480
+    },
+    {
+      "epoch": 0.6374840628984275,
+      "grad_norm": 1.5896166563034058,
+      "learning_rate": 0.00015752939509845588,
+      "loss": 1.4852,
+      "step": 1500
+    },
+    {
+      "epoch": 0.64598385040374,
+      "grad_norm": 1.5948361158370972,
+      "learning_rate": 0.00015696274259810174,
+      "loss": 1.5728,
+      "step": 1520
+    },
+    {
+      "epoch": 0.6544836379090523,
+      "grad_norm": 1.3188483715057373,
+      "learning_rate": 0.00015639609009774757,
+      "loss": 1.5287,
+      "step": 1540
+    },
+    {
+      "epoch": 0.6629834254143646,
+      "grad_norm": 1.2547403573989868,
+      "learning_rate": 0.00015582943759739342,
+      "loss": 1.4523,
+      "step": 1560
+    },
+    {
+      "epoch": 0.671483212919677,
+      "grad_norm": 1.3939685821533203,
+      "learning_rate": 0.00015526278509703925,
+      "loss": 1.4659,
+      "step": 1580
+    },
+    {
+      "epoch": 0.6799830004249894,
+      "grad_norm": 1.1834455728530884,
+      "learning_rate": 0.0001546961325966851,
+      "loss": 1.4674,
+      "step": 1600
+    },
+    {
+      "epoch": 0.6884827879303017,
+      "grad_norm": 1.5032856464385986,
+      "learning_rate": 0.00015412948009633094,
+      "loss": 1.6103,
+      "step": 1620
+    },
+    {
+      "epoch": 0.6969825754356141,
+      "grad_norm": 1.4383857250213623,
+      "learning_rate": 0.00015356282759597677,
+      "loss": 1.5483,
+      "step": 1640
+    },
+    {
+      "epoch": 0.7054823629409265,
+      "grad_norm": 1.2931615114212036,
+      "learning_rate": 0.0001529961750956226,
+      "loss": 1.5219,
+      "step": 1660
+    },
+    {
+      "epoch": 0.7139821504462388,
+      "grad_norm": 1.3393011093139648,
+      "learning_rate": 0.00015242952259526845,
+      "loss": 1.4984,
+      "step": 1680
+    },
+    {
+      "epoch": 0.7224819379515512,
+      "grad_norm": 1.372475266456604,
+      "learning_rate": 0.0001518628700949143,
+      "loss": 1.4966,
+      "step": 1700
+    },
+    {
+      "epoch": 0.7309817254568636,
+      "grad_norm": 1.4294521808624268,
+      "learning_rate": 0.00015129621759456014,
+      "loss": 1.4844,
+      "step": 1720
+    },
+    {
+      "epoch": 0.7394815129621759,
+      "grad_norm": 1.5531823635101318,
+      "learning_rate": 0.000150729565094206,
+      "loss": 1.5581,
+      "step": 1740
+    },
+    {
+      "epoch": 0.7479813004674883,
+      "grad_norm": 1.5481995344161987,
+      "learning_rate": 0.00015016291259385182,
+      "loss": 1.5487,
+      "step": 1760
+    },
+    {
+      "epoch": 0.7564810879728007,
+      "grad_norm": 1.4445319175720215,
+      "learning_rate": 0.00014959626009349768,
+      "loss": 1.5389,
+      "step": 1780
+    },
+    {
+      "epoch": 0.764980875478113,
+      "grad_norm": 1.538258671760559,
+      "learning_rate": 0.0001490296075931435,
+      "loss": 1.5034,
+      "step": 1800
+    },
+    {
+      "epoch": 0.7734806629834254,
+      "grad_norm": 1.4794244766235352,
+      "learning_rate": 0.00014846295509278936,
+      "loss": 1.4403,
+      "step": 1820
+    },
+    {
+      "epoch": 0.7819804504887378,
+      "grad_norm": 1.3663828372955322,
+      "learning_rate": 0.0001478963025924352,
+      "loss": 1.5626,
+      "step": 1840
+    },
+    {
+      "epoch": 0.7904802379940501,
+      "grad_norm": 1.6889076232910156,
+      "learning_rate": 0.00014732965009208105,
+      "loss": 1.4974,
+      "step": 1860
+    },
+    {
+      "epoch": 0.7989800254993625,
+      "grad_norm": 1.4024620056152344,
+      "learning_rate": 0.00014676299759172688,
+      "loss": 1.5809,
+      "step": 1880
+    },
+    {
+      "epoch": 0.8074798130046749,
+      "grad_norm": 1.3137383460998535,
+      "learning_rate": 0.00014619634509137274,
+      "loss": 1.5202,
+      "step": 1900
+    },
+    {
+      "epoch": 0.8159796005099873,
+      "grad_norm": 1.5561782121658325,
+      "learning_rate": 0.00014562969259101856,
+      "loss": 1.4611,
+      "step": 1920
+    },
+    {
+      "epoch": 0.8244793880152996,
+      "grad_norm": 1.4386446475982666,
+      "learning_rate": 0.0001450630400906644,
+      "loss": 1.5364,
+      "step": 1940
+    },
+    {
+      "epoch": 0.832979175520612,
+      "grad_norm": 1.4223254919052124,
+      "learning_rate": 0.00014452472021532796,
+      "loss": 1.4866,
+      "step": 1960
+    },
+    {
+      "epoch": 0.8414789630259244,
+      "grad_norm": 1.290819525718689,
+      "learning_rate": 0.0001439580677149738,
+      "loss": 1.5182,
+      "step": 1980
+    },
+    {
+      "epoch": 0.8499787505312367,
+      "grad_norm": 1.3473018407821655,
+      "learning_rate": 0.00014339141521461964,
+      "loss": 1.4962,
+      "step": 2000
+    },
+    {
+      "epoch": 0.858478538036549,
+      "grad_norm": 1.480664849281311,
+      "learning_rate": 0.00014282476271426547,
+      "loss": 1.5277,
+      "step": 2020
+    },
+    {
+      "epoch": 0.8669783255418615,
+      "grad_norm": 1.4417237043380737,
+      "learning_rate": 0.00014225811021391133,
+      "loss": 1.4646,
+      "step": 2040
+    },
+    {
+      "epoch": 0.8754781130471738,
+      "grad_norm": 1.3464792966842651,
+      "learning_rate": 0.00014169145771355716,
+      "loss": 1.5028,
+      "step": 2060
+    },
+    {
+      "epoch": 0.8839779005524862,
+      "grad_norm": 1.4068652391433716,
+      "learning_rate": 0.000141124805213203,
+      "loss": 1.5142,
+      "step": 2080
+    },
+    {
+      "epoch": 0.8924776880577986,
+      "grad_norm": 1.321798324584961,
+      "learning_rate": 0.00014055815271284884,
+      "loss": 1.5505,
+      "step": 2100
+    },
+    {
+      "epoch": 0.9009774755631109,
+      "grad_norm": 1.38119375705719,
+      "learning_rate": 0.0001399915002124947,
+      "loss": 1.5606,
+      "step": 2120
+    },
+    {
+      "epoch": 0.9094772630684232,
+      "grad_norm": 1.4736709594726562,
+      "learning_rate": 0.00013942484771214053,
+      "loss": 1.5104,
+      "step": 2140
+    },
+    {
+      "epoch": 0.9179770505737357,
+      "grad_norm": 1.5708409547805786,
+      "learning_rate": 0.00013885819521178638,
+      "loss": 1.5129,
+      "step": 2160
+    },
+    {
+      "epoch": 0.926476838079048,
+      "grad_norm": 1.2119626998901367,
+      "learning_rate": 0.00013829154271143224,
+      "loss": 1.5139,
+      "step": 2180
+    },
+    {
+      "epoch": 0.9349766255843603,
+      "grad_norm": 1.4460374116897583,
+      "learning_rate": 0.00013772489021107807,
+      "loss": 1.5176,
+      "step": 2200
+    },
+    {
+      "epoch": 0.9434764130896728,
+      "grad_norm": 1.377457618713379,
+      "learning_rate": 0.00013715823771072392,
+      "loss": 1.4846,
+      "step": 2220
+    },
+    {
+      "epoch": 0.9519762005949851,
+      "grad_norm": 1.5439783334732056,
+      "learning_rate": 0.00013659158521036975,
+      "loss": 1.4432,
+      "step": 2240
+    },
+    {
+      "epoch": 0.9604759881002974,
+      "grad_norm": 1.3697561025619507,
+      "learning_rate": 0.00013602493271001558,
+      "loss": 1.5054,
+      "step": 2260
+    },
+    {
+      "epoch": 0.9689757756056099,
+      "grad_norm": 1.6576790809631348,
+      "learning_rate": 0.0001354582802096614,
+      "loss": 1.5202,
+      "step": 2280
+    },
+    {
+      "epoch": 0.9774755631109222,
+      "grad_norm": 1.2522470951080322,
+      "learning_rate": 0.00013489162770930727,
+      "loss": 1.5237,
+      "step": 2300
+    },
+    {
+      "epoch": 0.9859753506162346,
+      "grad_norm": 1.316643476486206,
+      "learning_rate": 0.0001343249752089531,
+      "loss": 1.4693,
+      "step": 2320
+    },
+    {
+      "epoch": 0.994475138121547,
+      "grad_norm": 1.415403127670288,
+      "learning_rate": 0.00013375832270859895,
+      "loss": 1.5064,
+      "step": 2340
+    },
+    {
+      "epoch": 1.0029749256268594,
+      "grad_norm": 1.2021781206130981,
+      "learning_rate": 0.00013319167020824478,
+      "loss": 1.4576,
+      "step": 2360
+    },
+    {
+      "epoch": 1.0114747131321717,
+      "grad_norm": 1.391981840133667,
+      "learning_rate": 0.00013262501770789064,
+      "loss": 1.4672,
+      "step": 2380
+    },
+    {
+      "epoch": 1.019974500637484,
+      "grad_norm": 1.6223762035369873,
+      "learning_rate": 0.0001320583652075365,
+      "loss": 1.4746,
+      "step": 2400
+    },
+    {
+      "epoch": 1.0284742881427964,
+      "grad_norm": 1.584978699684143,
+      "learning_rate": 0.00013149171270718233,
+      "loss": 1.446,
+      "step": 2420
+    },
+    {
+      "epoch": 1.0369740756481087,
+      "grad_norm": 1.3905613422393799,
+      "learning_rate": 0.00013092506020682818,
+      "loss": 1.4673,
+      "step": 2440
+    },
+    {
+      "epoch": 1.045473863153421,
+      "grad_norm": 1.3543156385421753,
+      "learning_rate": 0.000130358407706474,
+      "loss": 1.4774,
+      "step": 2460
+    },
+    {
+      "epoch": 1.0539736506587336,
+      "grad_norm": 1.4284417629241943,
+      "learning_rate": 0.00012979175520611987,
+      "loss": 1.4828,
+      "step": 2480
+    },
+    {
+      "epoch": 1.062473438164046,
+      "grad_norm": 1.2880805730819702,
+      "learning_rate": 0.0001292251027057657,
+      "loss": 1.4688,
+      "step": 2500
+    },
+    {
+      "epoch": 1.0709732256693583,
+      "grad_norm": 1.4029446840286255,
+      "learning_rate": 0.00012865845020541155,
+      "loss": 1.4667,
+      "step": 2520
+    },
+    {
+      "epoch": 1.0794730131746706,
+      "grad_norm": 1.4345691204071045,
+      "learning_rate": 0.00012809179770505738,
+      "loss": 1.4268,
+      "step": 2540
+    },
+    {
+      "epoch": 1.087972800679983,
+      "grad_norm": 1.5828334093093872,
+      "learning_rate": 0.0001275251452047032,
+      "loss": 1.3897,
+      "step": 2560
+    },
+    {
+      "epoch": 1.0964725881852955,
+      "grad_norm": 1.8496978282928467,
+      "learning_rate": 0.00012695849270434907,
+      "loss": 1.4783,
+      "step": 2580
+    },
+    {
+      "epoch": 1.1049723756906078,
+      "grad_norm": 1.756039023399353,
+      "learning_rate": 0.0001263918402039949,
+      "loss": 1.4611,
+      "step": 2600
+    },
+    {
+      "epoch": 1.1134721631959201,
+      "grad_norm": 1.3964674472808838,
+      "learning_rate": 0.00012582518770364075,
+      "loss": 1.4139,
+      "step": 2620
+    },
+    {
+      "epoch": 1.1219719507012325,
+      "grad_norm": 1.7207622528076172,
+      "learning_rate": 0.00012525853520328658,
+      "loss": 1.4206,
+      "step": 2640
+    },
+    {
+      "epoch": 1.1304717382065448,
+      "grad_norm": 1.4870052337646484,
+      "learning_rate": 0.00012469188270293244,
+      "loss": 1.3631,
+      "step": 2660
+    },
+    {
+      "epoch": 1.1389715257118571,
+      "grad_norm": 1.4430071115493774,
+      "learning_rate": 0.00012412523020257827,
+      "loss": 1.4766,
+      "step": 2680
+    },
+    {
+      "epoch": 1.1474713132171697,
+      "grad_norm": 1.5112409591674805,
+      "learning_rate": 0.00012355857770222412,
+      "loss": 1.461,
+      "step": 2700
+    },
+    {
+      "epoch": 1.155971100722482,
+      "grad_norm": 1.4189656972885132,
+      "learning_rate": 0.00012299192520186995,
+      "loss": 1.389,
+      "step": 2720
+    },
+    {
+      "epoch": 1.1644708882277943,
+      "grad_norm": 1.543331265449524,
+      "learning_rate": 0.0001224252727015158,
+      "loss": 1.4474,
+      "step": 2740
+    },
+    {
+      "epoch": 1.1729706757331066,
+      "grad_norm": 1.398582100868225,
+      "learning_rate": 0.00012185862020116164,
+      "loss": 1.4047,
+      "step": 2760
+    },
+    {
+      "epoch": 1.181470463238419,
+      "grad_norm": 1.3756153583526611,
+      "learning_rate": 0.00012129196770080748,
+      "loss": 1.4086,
+      "step": 2780
+    },
+    {
+      "epoch": 1.1899702507437313,
+      "grad_norm": 1.5835983753204346,
+      "learning_rate": 0.00012072531520045334,
+      "loss": 1.4855,
+      "step": 2800
+    },
+    {
+      "epoch": 1.1984700382490439,
+      "grad_norm": 1.516381025314331,
+      "learning_rate": 0.00012015866270009917,
+      "loss": 1.455,
+      "step": 2820
+    },
+    {
+      "epoch": 1.2069698257543562,
+      "grad_norm": 1.2615922689437866,
+      "learning_rate": 0.00011959201019974502,
+      "loss": 1.4531,
+      "step": 2840
+    },
+    {
+      "epoch": 1.2154696132596685,
+      "grad_norm": 1.5319349765777588,
+      "learning_rate": 0.00011902535769939085,
+      "loss": 1.4098,
+      "step": 2860
+    },
+    {
+      "epoch": 1.2239694007649808,
+      "grad_norm": 1.384494423866272,
+      "learning_rate": 0.0001184587051990367,
+      "loss": 1.424,
+      "step": 2880
+    },
+    {
+      "epoch": 1.2324691882702932,
+      "grad_norm": 1.4936223030090332,
+      "learning_rate": 0.00011789205269868254,
+      "loss": 1.4409,
+      "step": 2900
+    },
+    {
+      "epoch": 1.2409689757756057,
+      "grad_norm": 1.8717914819717407,
+      "learning_rate": 0.00011732540019832838,
+      "loss": 1.4478,
+      "step": 2920
+    },
+    {
+      "epoch": 1.249468763280918,
+      "grad_norm": 1.5673747062683105,
+      "learning_rate": 0.00011675874769797421,
+      "loss": 1.4076,
+      "step": 2940
+    },
+    {
+      "epoch": 1.2579685507862304,
+      "grad_norm": 1.5353143215179443,
+      "learning_rate": 0.00011619209519762007,
+      "loss": 1.4527,
+      "step": 2960
+    },
+    {
+      "epoch": 1.2664683382915427,
+      "grad_norm": 1.5840922594070435,
+      "learning_rate": 0.00011562544269726592,
+      "loss": 1.4547,
+      "step": 2980
+    },
+    {
+      "epoch": 1.274968125796855,
+      "grad_norm": 1.432485580444336,
+      "learning_rate": 0.00011505879019691175,
+      "loss": 1.3915,
+      "step": 3000
+    },
+    {
+      "epoch": 1.2834679133021676,
+      "grad_norm": 1.6025019884109497,
+      "learning_rate": 0.0001144921376965576,
+      "loss": 1.4778,
+      "step": 3020
+    },
+    {
+      "epoch": 1.2919677008074797,
+      "grad_norm": 1.446283221244812,
+      "learning_rate": 0.00011392548519620342,
+      "loss": 1.4232,
+      "step": 3040
+    },
+    {
+      "epoch": 1.3004674883127922,
+      "grad_norm": 1.770347237586975,
+      "learning_rate": 0.00011335883269584928,
+      "loss": 1.4471,
+      "step": 3060
+    },
+    {
+      "epoch": 1.3089672758181046,
+      "grad_norm": 1.6366223096847534,
+      "learning_rate": 0.00011279218019549511,
+      "loss": 1.3898,
+      "step": 3080
+    },
+    {
+      "epoch": 1.317467063323417,
+      "grad_norm": 1.6164251565933228,
+      "learning_rate": 0.00011222552769514096,
+      "loss": 1.4911,
+      "step": 3100
+    },
+    {
+      "epoch": 1.3259668508287292,
+      "grad_norm": 1.72907292842865,
+      "learning_rate": 0.0001116588751947868,
+      "loss": 1.4324,
+      "step": 3120
+    },
+    {
+      "epoch": 1.3344666383340416,
+      "grad_norm": 1.5645689964294434,
+      "learning_rate": 0.00011109222269443265,
+      "loss": 1.4571,
+      "step": 3140
+    },
+    {
+      "epoch": 1.342966425839354,
+      "grad_norm": 1.6170058250427246,
+      "learning_rate": 0.00011052557019407848,
+      "loss": 1.4016,
+      "step": 3160
+    },
+    {
+      "epoch": 1.3514662133446664,
+      "grad_norm": 1.806199312210083,
+      "learning_rate": 0.00010995891769372432,
+      "loss": 1.3691,
+      "step": 3180
+    },
+    {
+      "epoch": 1.3599660008499788,
+      "grad_norm": 1.3674649000167847,
+      "learning_rate": 0.00010939226519337018,
+      "loss": 1.3924,
+      "step": 3200
+    },
+    {
+      "epoch": 1.368465788355291,
+      "grad_norm": 1.7301322221755981,
+      "learning_rate": 0.00010882561269301601,
+      "loss": 1.4296,
+      "step": 3220
+    },
+    {
+      "epoch": 1.3769655758606034,
+      "grad_norm": 1.4442013502120972,
+      "learning_rate": 0.00010825896019266186,
+      "loss": 1.4403,
+      "step": 3240
+    },
+    {
+      "epoch": 1.385465363365916,
+      "grad_norm": 1.838722586631775,
+      "learning_rate": 0.0001076923076923077,
+      "loss": 1.4345,
+      "step": 3260
+    },
+    {
+      "epoch": 1.3939651508712283,
+      "grad_norm": 1.4899051189422607,
+      "learning_rate": 0.00010712565519195355,
+      "loss": 1.4075,
+      "step": 3280
+    },
+    {
+      "epoch": 1.4024649383765406,
+      "grad_norm": 1.5684807300567627,
+      "learning_rate": 0.00010655900269159938,
+      "loss": 1.3676,
+      "step": 3300
+    },
+    {
+      "epoch": 1.410964725881853,
+      "grad_norm": 1.5851366519927979,
+      "learning_rate": 0.00010599235019124522,
+      "loss": 1.4093,
+      "step": 3320
+    },
+    {
+      "epoch": 1.4194645133871653,
+      "grad_norm": 1.5306929349899292,
+      "learning_rate": 0.00010542569769089105,
+      "loss": 1.4281,
+      "step": 3340
+    },
+    {
+      "epoch": 1.4279643008924776,
+      "grad_norm": 1.798202633857727,
+      "learning_rate": 0.00010485904519053691,
+      "loss": 1.405,
+      "step": 3360
+    },
+    {
+      "epoch": 1.43646408839779,
+      "grad_norm": 1.725794792175293,
+      "learning_rate": 0.00010429239269018276,
+      "loss": 1.4055,
+      "step": 3380
+    },
+    {
+      "epoch": 1.4449638759031025,
+      "grad_norm": 1.4544621706008911,
+      "learning_rate": 0.00010372574018982859,
+      "loss": 1.4608,
+      "step": 3400
+    },
+    {
+      "epoch": 1.4534636634084148,
+      "grad_norm": 1.6607258319854736,
+      "learning_rate": 0.00010315908768947445,
+      "loss": 1.3565,
+      "step": 3420
+    },
+    {
+      "epoch": 1.4619634509137271,
+      "grad_norm": 1.3560140132904053,
+      "learning_rate": 0.00010259243518912028,
+      "loss": 1.4086,
+      "step": 3440
+    },
+    {
+      "epoch": 1.4704632384190395,
+      "grad_norm": 1.5847728252410889,
+      "learning_rate": 0.00010202578268876612,
+      "loss": 1.4006,
+      "step": 3460
+    },
+    {
+      "epoch": 1.4789630259243518,
+      "grad_norm": 1.555255651473999,
+      "learning_rate": 0.00010145913018841195,
+      "loss": 1.4639,
+      "step": 3480
+    },
+    {
+      "epoch": 1.4874628134296644,
+      "grad_norm": 1.389131784439087,
+      "learning_rate": 0.0001008924776880578,
+      "loss": 1.4167,
+      "step": 3500
+    },
+    {
+      "epoch": 1.4959626009349767,
+      "grad_norm": 1.435861349105835,
+      "learning_rate": 0.00010032582518770364,
+      "loss": 1.4018,
+      "step": 3520
+    },
+    {
+      "epoch": 1.504462388440289,
+      "grad_norm": 2.1325809955596924,
+      "learning_rate": 9.975917268734949e-05,
+      "loss": 1.4261,
+      "step": 3540
+    },
+    {
+      "epoch": 1.5129621759456013,
+      "grad_norm": 1.6307079792022705,
+      "learning_rate": 9.919252018699533e-05,
+      "loss": 1.4596,
+      "step": 3560
+    },
+    {
+      "epoch": 1.5214619634509137,
+      "grad_norm": 1.5455667972564697,
+      "learning_rate": 9.862586768664118e-05,
+      "loss": 1.4111,
+      "step": 3580
+    },
+    {
+      "epoch": 1.5299617509562262,
+      "grad_norm": 1.3528661727905273,
+      "learning_rate": 9.8059215186287e-05,
+      "loss": 1.4634,
+      "step": 3600
+    },
+    {
+      "epoch": 1.5384615384615383,
+      "grad_norm": 1.477866768836975,
+      "learning_rate": 9.749256268593285e-05,
+      "loss": 1.503,
+      "step": 3620
+    },
+    {
+      "epoch": 1.5469613259668509,
+      "grad_norm": 1.5204942226409912,
+      "learning_rate": 9.692591018557869e-05,
+      "loss": 1.4811,
+      "step": 3640
+    },
+    {
+      "epoch": 1.5554611134721632,
+      "grad_norm": 1.5966800451278687,
+      "learning_rate": 9.635925768522453e-05,
+      "loss": 1.4768,
+      "step": 3660
+    },
+    {
+      "epoch": 1.5639609009774755,
+      "grad_norm": 1.538609504699707,
+      "learning_rate": 9.579260518487039e-05,
+      "loss": 1.4379,
+      "step": 3680
+    },
+    {
+      "epoch": 1.572460688482788,
+      "grad_norm": 1.6952332258224487,
+      "learning_rate": 9.522595268451623e-05,
+      "loss": 1.4503,
+      "step": 3700
+    },
+    {
+      "epoch": 1.5809604759881002,
+      "grad_norm": 1.4433083534240723,
+      "learning_rate": 9.465930018416208e-05,
+      "loss": 1.4461,
+      "step": 3720
+    },
+    {
+      "epoch": 1.5894602634934127,
+      "grad_norm": 1.6331605911254883,
+      "learning_rate": 9.40926476838079e-05,
+      "loss": 1.4212,
+      "step": 3740
+    },
+    {
+      "epoch": 1.597960050998725,
+      "grad_norm": 2.1244919300079346,
+      "learning_rate": 9.352599518345375e-05,
+      "loss": 1.3775,
+      "step": 3760
+    },
+    {
+      "epoch": 1.6064598385040374,
+      "grad_norm": 1.6804299354553223,
+      "learning_rate": 9.295934268309959e-05,
+      "loss": 1.4365,
+      "step": 3780
+    },
+    {
+      "epoch": 1.6149596260093497,
+      "grad_norm": 1.5793712139129639,
+      "learning_rate": 9.239269018274543e-05,
+      "loss": 1.3996,
+      "step": 3800
+    },
+    {
+      "epoch": 1.623459413514662,
+      "grad_norm": 1.646560788154602,
+      "learning_rate": 9.182603768239128e-05,
+      "loss": 1.4735,
+      "step": 3820
+    },
+    {
+      "epoch": 1.6319592010199746,
+      "grad_norm": 1.6244877576828003,
+      "learning_rate": 9.125938518203712e-05,
+      "loss": 1.4646,
+      "step": 3840
+    },
+    {
+      "epoch": 1.6404589885252867,
+      "grad_norm": 1.4168405532836914,
+      "learning_rate": 9.069273268168296e-05,
+      "loss": 1.3962,
+      "step": 3860
+    },
+    {
+      "epoch": 1.6489587760305993,
+      "grad_norm": 1.4018287658691406,
+      "learning_rate": 9.01260801813288e-05,
+      "loss": 1.3747,
+      "step": 3880
+    },
+    {
+      "epoch": 1.6574585635359116,
+      "grad_norm": 1.3393577337265015,
+      "learning_rate": 8.955942768097465e-05,
+      "loss": 1.3644,
+      "step": 3900
+    },
+    {
+      "epoch": 1.665958351041224,
+      "grad_norm": 1.5588535070419312,
+      "learning_rate": 8.899277518062049e-05,
+      "loss": 1.417,
+      "step": 3920
+    },
+    {
+      "epoch": 1.6744581385465365,
+      "grad_norm": 1.4518215656280518,
+      "learning_rate": 8.842612268026633e-05,
+      "loss": 1.4177,
+      "step": 3940
+    },
+    {
+      "epoch": 1.6829579260518486,
+      "grad_norm": 1.593959093093872,
+      "learning_rate": 8.785947017991218e-05,
+      "loss": 1.4692,
+      "step": 3960
+    },
+    {
+      "epoch": 1.6914577135571611,
+      "grad_norm": 1.61430025100708,
+      "learning_rate": 8.729281767955802e-05,
+      "loss": 1.4231,
+      "step": 3980
+    },
+    {
+      "epoch": 1.6999575010624735,
+      "grad_norm": 1.5006210803985596,
+      "learning_rate": 8.672616517920386e-05,
+      "loss": 1.4632,
+      "step": 4000
+    },
+    {
+      "epoch": 1.7084572885677858,
+      "grad_norm": 1.5484602451324463,
+      "learning_rate": 8.615951267884969e-05,
+      "loss": 1.4282,
+      "step": 4020
+    },
+    {
+      "epoch": 1.7169570760730983,
+      "grad_norm": 1.667822003364563,
+      "learning_rate": 8.559286017849553e-05,
+      "loss": 1.3641,
+      "step": 4040
+    },
+    {
+      "epoch": 1.7254568635784104,
+      "grad_norm": 1.6226531267166138,
+      "learning_rate": 8.502620767814138e-05,
+      "loss": 1.4197,
+      "step": 4060
+    },
+    {
+      "epoch": 1.733956651083723,
+      "grad_norm": 2.0089526176452637,
+      "learning_rate": 8.445955517778723e-05,
+      "loss": 1.4129,
+      "step": 4080
+    },
+    {
+      "epoch": 1.7424564385890353,
+      "grad_norm": 1.3930327892303467,
+      "learning_rate": 8.389290267743307e-05,
+      "loss": 1.3614,
+      "step": 4100
+    },
+    {
+      "epoch": 1.7509562260943476,
+      "grad_norm": 1.890655755996704,
+      "learning_rate": 8.332625017707892e-05,
+      "loss": 1.4578,
+      "step": 4120
+    },
+    {
+      "epoch": 1.75945601359966,
+      "grad_norm": 1.7144534587860107,
+      "learning_rate": 8.275959767672476e-05,
+      "loss": 1.4148,
+      "step": 4140
+    },
+    {
+      "epoch": 1.7679558011049723,
+      "grad_norm": 1.5091826915740967,
+      "learning_rate": 8.219294517637059e-05,
+      "loss": 1.4203,
+      "step": 4160
+    },
+    {
+      "epoch": 1.7764555886102849,
+      "grad_norm": 1.7839044332504272,
+      "learning_rate": 8.162629267601643e-05,
+      "loss": 1.3824,
+      "step": 4180
+    },
+    {
+      "epoch": 1.784955376115597,
+      "grad_norm": 1.661871075630188,
+      "learning_rate": 8.105964017566228e-05,
+      "loss": 1.4356,
+      "step": 4200
+    },
+    {
+      "epoch": 1.7934551636209095,
+      "grad_norm": 1.4366284608840942,
+      "learning_rate": 8.049298767530812e-05,
+      "loss": 1.3839,
+      "step": 4220
+    },
+    {
+      "epoch": 1.8019549511262218,
+      "grad_norm": 1.5518572330474854,
+      "learning_rate": 7.992633517495396e-05,
+      "loss": 1.4134,
+      "step": 4240
+    },
+    {
+      "epoch": 1.8104547386315342,
+      "grad_norm": 1.652972936630249,
+      "learning_rate": 7.93596826745998e-05,
+      "loss": 1.4234,
+      "step": 4260
+    },
+    {
+      "epoch": 1.8189545261368467,
+      "grad_norm": 1.4948612451553345,
+      "learning_rate": 7.879303017424565e-05,
+      "loss": 1.3798,
+      "step": 4280
+    },
+    {
+      "epoch": 1.8274543136421588,
+      "grad_norm": 1.6756293773651123,
+      "learning_rate": 7.822637767389149e-05,
+      "loss": 1.4239,
+      "step": 4300
+    },
+    {
+      "epoch": 1.8359541011474714,
+      "grad_norm": 1.7066694498062134,
+      "learning_rate": 7.765972517353733e-05,
+      "loss": 1.4485,
+      "step": 4320
+    },
+    {
+      "epoch": 1.8444538886527837,
+      "grad_norm": 1.478449821472168,
+      "learning_rate": 7.709307267318317e-05,
+      "loss": 1.3998,
+      "step": 4340
+    },
+    {
+      "epoch": 1.852953676158096,
+      "grad_norm": 1.7076212167739868,
+      "learning_rate": 7.652642017282902e-05,
+      "loss": 1.4249,
+      "step": 4360
+    },
+    {
+      "epoch": 1.8614534636634086,
+      "grad_norm": 1.5418049097061157,
+      "learning_rate": 7.595976767247486e-05,
+      "loss": 1.3806,
+      "step": 4380
+    },
+    {
+      "epoch": 1.8699532511687207,
+      "grad_norm": 1.668404459953308,
+      "learning_rate": 7.53931151721207e-05,
+      "loss": 1.4088,
+      "step": 4400
+    },
+    {
+      "epoch": 1.8784530386740332,
+      "grad_norm": 2.0103743076324463,
+      "learning_rate": 7.482646267176655e-05,
+      "loss": 1.4432,
+      "step": 4420
+    },
+    {
+      "epoch": 1.8869528261793456,
+      "grad_norm": 1.8902521133422852,
+      "learning_rate": 7.425981017141237e-05,
+      "loss": 1.4262,
+      "step": 4440
+    },
+    {
+      "epoch": 1.895452613684658,
+      "grad_norm": 1.493699550628662,
+      "learning_rate": 7.369315767105822e-05,
+      "loss": 1.402,
+      "step": 4460
+    },
+    {
+      "epoch": 1.9039524011899702,
+      "grad_norm": 1.617872953414917,
+      "learning_rate": 7.312650517070407e-05,
+      "loss": 1.4409,
+      "step": 4480
+    },
+    {
+      "epoch": 1.9124521886952826,
+      "grad_norm": 1.5638338327407837,
+      "learning_rate": 7.255985267034992e-05,
+      "loss": 1.3933,
+      "step": 4500
+    },
+    {
+      "epoch": 1.920951976200595,
+      "grad_norm": 1.6773780584335327,
+      "learning_rate": 7.199320016999576e-05,
+      "loss": 1.4084,
+      "step": 4520
+    },
+    {
+      "epoch": 1.9294517637059072,
+      "grad_norm": 1.5437266826629639,
+      "learning_rate": 7.14265476696416e-05,
+      "loss": 1.4419,
+      "step": 4540
+    },
+    {
+      "epoch": 1.9379515512112198,
+      "grad_norm": 1.5624651908874512,
+      "learning_rate": 7.085989516928744e-05,
+      "loss": 1.3479,
+      "step": 4560
+    },
+    {
+      "epoch": 1.946451338716532,
+      "grad_norm": 1.594768762588501,
+      "learning_rate": 7.029324266893327e-05,
+      "loss": 1.4021,
+      "step": 4580
+    },
+    {
+      "epoch": 1.9549511262218444,
+      "grad_norm": 1.7385071516036987,
+      "learning_rate": 6.972659016857912e-05,
+      "loss": 1.3459,
+      "step": 4600
+    },
+    {
+      "epoch": 1.963450913727157,
+      "grad_norm": 1.835210919380188,
+      "learning_rate": 6.915993766822496e-05,
+      "loss": 1.4334,
+      "step": 4620
+    },
+    {
+      "epoch": 1.971950701232469,
+      "grad_norm": 1.3983690738677979,
+      "learning_rate": 6.85932851678708e-05,
+      "loss": 1.396,
+      "step": 4640
+    },
+    {
+      "epoch": 1.9804504887377816,
+      "grad_norm": 1.4692546129226685,
+      "learning_rate": 6.802663266751664e-05,
+      "loss": 1.4184,
+      "step": 4660
+    },
+    {
+      "epoch": 1.988950276243094,
+      "grad_norm": 2.199734926223755,
+      "learning_rate": 6.74599801671625e-05,
+      "loss": 1.3902,
+      "step": 4680
+    },
+    {
+      "epoch": 1.9974500637484063,
+      "grad_norm": 1.8094977140426636,
+      "learning_rate": 6.689332766680833e-05,
+      "loss": 1.3973,
+      "step": 4700
+    }
+  ],
+  "logging_steps": 20,
+  "max_steps": 7059,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 5.994699046885786e+16,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

tinyllama-lora-finetuned/checkpoint-4706/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:15e87998291b092367622e9a01c7bf9c9073fbba3c3325c704524d294a28c0e8
+size 5304

tinyllama-lora-finetuned/checkpoint-7059/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.2

tinyllama-lora-finetuned/checkpoint-7059/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

tinyllama-lora-finetuned/checkpoint-7059/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:30714917adf91c927617da51652825f423856d999278ab43a89ff8801a725527
+size 9022864

tinyllama-lora-finetuned/checkpoint-7059/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66f13a3291aa3847a59743082fee3c2915e37b8fe0f282ce10a3a9799a89a819
+size 18096570

tinyllama-lora-finetuned/checkpoint-7059/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b996b3ef9171dbe240918633947a37fa68cb6622b01fabf832fb70526c7cad1
+size 14244

tinyllama-lora-finetuned/checkpoint-7059/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:476fa65f582625715ac485c74528d19b46570b69068eac5629dbb7e98bd6c520
+size 988

tinyllama-lora-finetuned/checkpoint-7059/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3a0ac6e7c83ab9a47bf0cb56003aa5c935cc7550b1e53b681393c66779198082
+size 1064

tinyllama-lora-finetuned/checkpoint-7059/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "</s>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tinyllama-lora-finetuned/checkpoint-7059/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tinyllama-lora-finetuned/checkpoint-7059/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

tinyllama-lora-finetuned/checkpoint-7059/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "chat_template": "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "legacy": false,
+  "model_max_length": 2048,
+  "pad_token": "</s>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

tinyllama-lora-finetuned/checkpoint-7059/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2498 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 3.0,
+  "eval_steps": 500,
+  "global_step": 7059,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.008499787505312367,
+      "grad_norm": 1.8467686176300049,
+      "learning_rate": 0.00019946168012466354,
+      "loss": 2.1643,
+      "step": 20
+    },
+    {
+      "epoch": 0.016999575010624733,
+      "grad_norm": 2.2261719703674316,
+      "learning_rate": 0.0001988950276243094,
+      "loss": 1.7448,
+      "step": 40
+    },
+    {
+      "epoch": 0.025499362515937103,
+      "grad_norm": 1.9097844362258911,
+      "learning_rate": 0.00019832837512395523,
+      "loss": 1.702,
+      "step": 60
+    },
+    {
+      "epoch": 0.033999150021249466,
+      "grad_norm": 1.676216721534729,
+      "learning_rate": 0.00019776172262360109,
+      "loss": 1.7414,
+      "step": 80
+    },
+    {
+      "epoch": 0.042498937526561836,
+      "grad_norm": 1.7480692863464355,
+      "learning_rate": 0.00019719507012324694,
+      "loss": 1.6,
+      "step": 100
+    },
+    {
+      "epoch": 0.050998725031874206,
+      "grad_norm": 1.603786826133728,
+      "learning_rate": 0.00019662841762289277,
+      "loss": 1.72,
+      "step": 120
+    },
+    {
+      "epoch": 0.05949851253718657,
+      "grad_norm": 1.577128529548645,
+      "learning_rate": 0.00019606176512253863,
+      "loss": 1.6804,
+      "step": 140
+    },
+    {
+      "epoch": 0.06799830004249893,
+      "grad_norm": 1.5658762454986572,
+      "learning_rate": 0.00019549511262218446,
+      "loss": 1.6582,
+      "step": 160
+    },
+    {
+      "epoch": 0.0764980875478113,
+      "grad_norm": 1.7549939155578613,
+      "learning_rate": 0.00019492846012183031,
+      "loss": 1.6238,
+      "step": 180
+    },
+    {
+      "epoch": 0.08499787505312367,
+      "grad_norm": 1.490046501159668,
+      "learning_rate": 0.00019436180762147614,
+      "loss": 1.6661,
+      "step": 200
+    },
+    {
+      "epoch": 0.09349766255843604,
+      "grad_norm": 1.7966225147247314,
+      "learning_rate": 0.00019379515512112197,
+      "loss": 1.6258,
+      "step": 220
+    },
+    {
+      "epoch": 0.10199745006374841,
+      "grad_norm": 1.6181796789169312,
+      "learning_rate": 0.00019322850262076783,
+      "loss": 1.6042,
+      "step": 240
+    },
+    {
+      "epoch": 0.11049723756906077,
+      "grad_norm": 1.4492865800857544,
+      "learning_rate": 0.00019266185012041366,
+      "loss": 1.5649,
+      "step": 260
+    },
+    {
+      "epoch": 0.11899702507437314,
+      "grad_norm": 2.01737117767334,
+      "learning_rate": 0.0001920951976200595,
+      "loss": 1.6877,
+      "step": 280
+    },
+    {
+      "epoch": 0.1274968125796855,
+      "grad_norm": 1.4284173250198364,
+      "learning_rate": 0.00019152854511970534,
+      "loss": 1.5727,
+      "step": 300
+    },
+    {
+      "epoch": 0.13599660008499787,
+      "grad_norm": 1.4602673053741455,
+      "learning_rate": 0.0001909618926193512,
+      "loss": 1.5837,
+      "step": 320
+    },
+    {
+      "epoch": 0.14449638759031025,
+      "grad_norm": 1.6680593490600586,
+      "learning_rate": 0.00019039524011899703,
+      "loss": 1.6122,
+      "step": 340
+    },
+    {
+      "epoch": 0.1529961750956226,
+      "grad_norm": 1.4604177474975586,
+      "learning_rate": 0.00018982858761864288,
+      "loss": 1.6064,
+      "step": 360
+    },
+    {
+      "epoch": 0.161495962600935,
+      "grad_norm": 1.4658399820327759,
+      "learning_rate": 0.00018926193511828871,
+      "loss": 1.5842,
+      "step": 380
+    },
+    {
+      "epoch": 0.16999575010624735,
+      "grad_norm": 1.2307661771774292,
+      "learning_rate": 0.00018869528261793457,
+      "loss": 1.6023,
+      "step": 400
+    },
+    {
+      "epoch": 0.1784955376115597,
+      "grad_norm": 1.4648102521896362,
+      "learning_rate": 0.0001881286301175804,
+      "loss": 1.5484,
+      "step": 420
+    },
+    {
+      "epoch": 0.18699532511687209,
+      "grad_norm": 1.4780811071395874,
+      "learning_rate": 0.00018756197761722626,
+      "loss": 1.5009,
+      "step": 440
+    },
+    {
+      "epoch": 0.19549511262218444,
+      "grad_norm": 1.4852784872055054,
+      "learning_rate": 0.00018699532511687208,
+      "loss": 1.6116,
+      "step": 460
+    },
+    {
+      "epoch": 0.20399490012749683,
+      "grad_norm": 1.4338926076889038,
+      "learning_rate": 0.00018642867261651794,
+      "loss": 1.6025,
+      "step": 480
+    },
+    {
+      "epoch": 0.21249468763280918,
+      "grad_norm": 1.5596784353256226,
+      "learning_rate": 0.00018586202011616377,
+      "loss": 1.6212,
+      "step": 500
+    },
+    {
+      "epoch": 0.22099447513812154,
+      "grad_norm": 1.2840818166732788,
+      "learning_rate": 0.0001852953676158096,
+      "loss": 1.6135,
+      "step": 520
+    },
+    {
+      "epoch": 0.22949426264343392,
+      "grad_norm": 1.5443825721740723,
+      "learning_rate": 0.00018472871511545546,
+      "loss": 1.5785,
+      "step": 540
+    },
+    {
+      "epoch": 0.23799405014874628,
+      "grad_norm": 1.5090241432189941,
+      "learning_rate": 0.00018416206261510129,
+      "loss": 1.5871,
+      "step": 560
+    },
+    {
+      "epoch": 0.24649383765405866,
+      "grad_norm": 1.613731861114502,
+      "learning_rate": 0.00018359541011474714,
+      "loss": 1.604,
+      "step": 580
+    },
+    {
+      "epoch": 0.254993625159371,
+      "grad_norm": 1.354567289352417,
+      "learning_rate": 0.00018302875761439297,
+      "loss": 1.4996,
+      "step": 600
+    },
+    {
+      "epoch": 0.2634934126646834,
+      "grad_norm": 1.5025413036346436,
+      "learning_rate": 0.00018246210511403883,
+      "loss": 1.5923,
+      "step": 620
+    },
+    {
+      "epoch": 0.27199320016999573,
+      "grad_norm": 1.3449057340621948,
+      "learning_rate": 0.00018189545261368466,
+      "loss": 1.5435,
+      "step": 640
+    },
+    {
+      "epoch": 0.2804929876753081,
+      "grad_norm": 1.2846812009811401,
+      "learning_rate": 0.0001813288001133305,
+      "loss": 1.536,
+      "step": 660
+    },
+    {
+      "epoch": 0.2889927751806205,
+      "grad_norm": 1.3012146949768066,
+      "learning_rate": 0.00018076214761297634,
+      "loss": 1.539,
+      "step": 680
+    },
+    {
+      "epoch": 0.2974925626859328,
+      "grad_norm": 1.5017261505126953,
+      "learning_rate": 0.0001801954951126222,
+      "loss": 1.5727,
+      "step": 700
+    },
+    {
+      "epoch": 0.3059923501912452,
+      "grad_norm": 1.3044496774673462,
+      "learning_rate": 0.00017962884261226805,
+      "loss": 1.6619,
+      "step": 720
+    },
+    {
+      "epoch": 0.3144921376965576,
+      "grad_norm": 1.6723718643188477,
+      "learning_rate": 0.00017906219011191388,
+      "loss": 1.5572,
+      "step": 740
+    },
+    {
+      "epoch": 0.32299192520187,
+      "grad_norm": 1.5928444862365723,
+      "learning_rate": 0.00017849553761155974,
+      "loss": 1.4755,
+      "step": 760
+    },
+    {
+      "epoch": 0.3314917127071823,
+      "grad_norm": 1.2782012224197388,
+      "learning_rate": 0.00017792888511120557,
+      "loss": 1.592,
+      "step": 780
+    },
+    {
+      "epoch": 0.3399915002124947,
+      "grad_norm": 1.349767804145813,
+      "learning_rate": 0.0001773622326108514,
+      "loss": 1.5436,
+      "step": 800
+    },
+    {
+      "epoch": 0.3484912877178071,
+      "grad_norm": 1.4979254007339478,
+      "learning_rate": 0.00017679558011049723,
+      "loss": 1.6191,
+      "step": 820
+    },
+    {
+      "epoch": 0.3569910752231194,
+      "grad_norm": 1.5227205753326416,
+      "learning_rate": 0.00017622892761014308,
+      "loss": 1.6074,
+      "step": 840
+    },
+    {
+      "epoch": 0.3654908627284318,
+      "grad_norm": 1.300671100616455,
+      "learning_rate": 0.0001756622751097889,
+      "loss": 1.5877,
+      "step": 860
+    },
+    {
+      "epoch": 0.37399065023374417,
+      "grad_norm": 1.1968013048171997,
+      "learning_rate": 0.00017509562260943477,
+      "loss": 1.5343,
+      "step": 880
+    },
+    {
+      "epoch": 0.3824904377390565,
+      "grad_norm": 1.4233676195144653,
+      "learning_rate": 0.00017452897010908063,
+      "loss": 1.5093,
+      "step": 900
+    },
+    {
+      "epoch": 0.3909902252443689,
+      "grad_norm": 1.5608493089675903,
+      "learning_rate": 0.00017396231760872645,
+      "loss": 1.552,
+      "step": 920
+    },
+    {
+      "epoch": 0.39949001274968127,
+      "grad_norm": 1.6073567867279053,
+      "learning_rate": 0.0001733956651083723,
+      "loss": 1.551,
+      "step": 940
+    },
+    {
+      "epoch": 0.40798980025499365,
+      "grad_norm": 1.1513320207595825,
+      "learning_rate": 0.00017282901260801814,
+      "loss": 1.51,
+      "step": 960
+    },
+    {
+      "epoch": 0.416489587760306,
+      "grad_norm": 1.3810391426086426,
+      "learning_rate": 0.000172262360107664,
+      "loss": 1.5447,
+      "step": 980
+    },
+    {
+      "epoch": 0.42498937526561836,
+      "grad_norm": 1.4732245206832886,
+      "learning_rate": 0.00017169570760730983,
+      "loss": 1.5781,
+      "step": 1000
+    },
+    {
+      "epoch": 0.43348916277093075,
+      "grad_norm": 1.1698389053344727,
+      "learning_rate": 0.00017112905510695568,
+      "loss": 1.5205,
+      "step": 1020
+    },
+    {
+      "epoch": 0.4419889502762431,
+      "grad_norm": 1.49894118309021,
+      "learning_rate": 0.0001705624026066015,
+      "loss": 1.6071,
+      "step": 1040
+    },
+    {
+      "epoch": 0.45048873778155546,
+      "grad_norm": 1.3100991249084473,
+      "learning_rate": 0.00016999575010624734,
+      "loss": 1.6148,
+      "step": 1060
+    },
+    {
+      "epoch": 0.45898852528686784,
+      "grad_norm": 1.5139139890670776,
+      "learning_rate": 0.0001694290976058932,
+      "loss": 1.5657,
+      "step": 1080
+    },
+    {
+      "epoch": 0.46748831279218017,
+      "grad_norm": 1.3952980041503906,
+      "learning_rate": 0.00016886244510553903,
+      "loss": 1.5596,
+      "step": 1100
+    },
+    {
+      "epoch": 0.47598810029749256,
+      "grad_norm": 1.2251731157302856,
+      "learning_rate": 0.00016829579260518488,
+      "loss": 1.5207,
+      "step": 1120
+    },
+    {
+      "epoch": 0.48448788780280494,
+      "grad_norm": 1.4944117069244385,
+      "learning_rate": 0.0001677291401048307,
+      "loss": 1.562,
+      "step": 1140
+    },
+    {
+      "epoch": 0.4929876753081173,
+      "grad_norm": 1.2576520442962646,
+      "learning_rate": 0.00016716248760447657,
+      "loss": 1.5362,
+      "step": 1160
+    },
+    {
+      "epoch": 0.5014874628134297,
+      "grad_norm": 1.3012641668319702,
+      "learning_rate": 0.0001665958351041224,
+      "loss": 1.5384,
+      "step": 1180
+    },
+    {
+      "epoch": 0.509987250318742,
+      "grad_norm": 1.3370224237442017,
+      "learning_rate": 0.00016602918260376825,
+      "loss": 1.5406,
+      "step": 1200
+    },
+    {
+      "epoch": 0.5184870378240544,
+      "grad_norm": 1.4711674451828003,
+      "learning_rate": 0.00016546253010341408,
+      "loss": 1.5489,
+      "step": 1220
+    },
+    {
+      "epoch": 0.5269868253293668,
+      "grad_norm": 1.3889433145523071,
+      "learning_rate": 0.00016489587760305994,
+      "loss": 1.5683,
+      "step": 1240
+    },
+    {
+      "epoch": 0.5354866128346791,
+      "grad_norm": 1.181380271911621,
+      "learning_rate": 0.00016432922510270577,
+      "loss": 1.5282,
+      "step": 1260
+    },
+    {
+      "epoch": 0.5439864003399915,
+      "grad_norm": 1.2874456644058228,
+      "learning_rate": 0.00016376257260235162,
+      "loss": 1.5642,
+      "step": 1280
+    },
+    {
+      "epoch": 0.5524861878453039,
+      "grad_norm": 1.2181987762451172,
+      "learning_rate": 0.00016319592010199748,
+      "loss": 1.5324,
+      "step": 1300
+    },
+    {
+      "epoch": 0.5609859753506162,
+      "grad_norm": 1.4149408340454102,
+      "learning_rate": 0.0001626292676016433,
+      "loss": 1.5366,
+      "step": 1320
+    },
+    {
+      "epoch": 0.5694857628559286,
+      "grad_norm": 1.1893357038497925,
+      "learning_rate": 0.00016206261510128914,
+      "loss": 1.5386,
+      "step": 1340
+    },
+    {
+      "epoch": 0.577985550361241,
+      "grad_norm": 1.4138270616531372,
+      "learning_rate": 0.00016149596260093497,
+      "loss": 1.4789,
+      "step": 1360
+    },
+    {
+      "epoch": 0.5864853378665533,
+      "grad_norm": 1.5200152397155762,
+      "learning_rate": 0.00016092931010058082,
+      "loss": 1.5012,
+      "step": 1380
+    },
+    {
+      "epoch": 0.5949851253718657,
+      "grad_norm": 1.2447080612182617,
+      "learning_rate": 0.00016036265760022665,
+      "loss": 1.498,
+      "step": 1400
+    },
+    {
+      "epoch": 0.6034849128771781,
+      "grad_norm": 1.4135057926177979,
+      "learning_rate": 0.0001597960050998725,
+      "loss": 1.4519,
+      "step": 1420
+    },
+    {
+      "epoch": 0.6119847003824904,
+      "grad_norm": 1.2652864456176758,
+      "learning_rate": 0.00015922935259951834,
+      "loss": 1.5595,
+      "step": 1440
+    },
+    {
+      "epoch": 0.6204844878878029,
+      "grad_norm": 1.356728434562683,
+      "learning_rate": 0.0001586627000991642,
+      "loss": 1.4859,
+      "step": 1460
+    },
+    {
+      "epoch": 0.6289842753931152,
+      "grad_norm": 1.3269623517990112,
+      "learning_rate": 0.00015809604759881002,
+      "loss": 1.5002,
+      "step": 1480
+    },
+    {
+      "epoch": 0.6374840628984275,
+      "grad_norm": 1.5896166563034058,
+      "learning_rate": 0.00015752939509845588,
+      "loss": 1.4852,
+      "step": 1500
+    },
+    {
+      "epoch": 0.64598385040374,
+      "grad_norm": 1.5948361158370972,
+      "learning_rate": 0.00015696274259810174,
+      "loss": 1.5728,
+      "step": 1520
+    },
+    {
+      "epoch": 0.6544836379090523,
+      "grad_norm": 1.3188483715057373,
+      "learning_rate": 0.00015639609009774757,
+      "loss": 1.5287,
+      "step": 1540
+    },
+    {
+      "epoch": 0.6629834254143646,
+      "grad_norm": 1.2547403573989868,
+      "learning_rate": 0.00015582943759739342,
+      "loss": 1.4523,
+      "step": 1560
+    },
+    {
+      "epoch": 0.671483212919677,
+      "grad_norm": 1.3939685821533203,
+      "learning_rate": 0.00015526278509703925,
+      "loss": 1.4659,
+      "step": 1580
+    },
+    {
+      "epoch": 0.6799830004249894,
+      "grad_norm": 1.1834455728530884,
+      "learning_rate": 0.0001546961325966851,
+      "loss": 1.4674,
+      "step": 1600
+    },
+    {
+      "epoch": 0.6884827879303017,
+      "grad_norm": 1.5032856464385986,
+      "learning_rate": 0.00015412948009633094,
+      "loss": 1.6103,
+      "step": 1620
+    },
+    {
+      "epoch": 0.6969825754356141,
+      "grad_norm": 1.4383857250213623,
+      "learning_rate": 0.00015356282759597677,
+      "loss": 1.5483,
+      "step": 1640
+    },
+    {
+      "epoch": 0.7054823629409265,
+      "grad_norm": 1.2931615114212036,
+      "learning_rate": 0.0001529961750956226,
+      "loss": 1.5219,
+      "step": 1660
+    },
+    {
+      "epoch": 0.7139821504462388,
+      "grad_norm": 1.3393011093139648,
+      "learning_rate": 0.00015242952259526845,
+      "loss": 1.4984,
+      "step": 1680
+    },
+    {
+      "epoch": 0.7224819379515512,
+      "grad_norm": 1.372475266456604,
+      "learning_rate": 0.0001518628700949143,
+      "loss": 1.4966,
+      "step": 1700
+    },
+    {
+      "epoch": 0.7309817254568636,
+      "grad_norm": 1.4294521808624268,
+      "learning_rate": 0.00015129621759456014,
+      "loss": 1.4844,
+      "step": 1720
+    },
+    {
+      "epoch": 0.7394815129621759,
+      "grad_norm": 1.5531823635101318,
+      "learning_rate": 0.000150729565094206,
+      "loss": 1.5581,
+      "step": 1740
+    },
+    {
+      "epoch": 0.7479813004674883,
+      "grad_norm": 1.5481995344161987,
+      "learning_rate": 0.00015016291259385182,
+      "loss": 1.5487,
+      "step": 1760
+    },
+    {
+      "epoch": 0.7564810879728007,
+      "grad_norm": 1.4445319175720215,
+      "learning_rate": 0.00014959626009349768,
+      "loss": 1.5389,
+      "step": 1780
+    },
+    {
+      "epoch": 0.764980875478113,
+      "grad_norm": 1.538258671760559,
+      "learning_rate": 0.0001490296075931435,
+      "loss": 1.5034,
+      "step": 1800
+    },
+    {
+      "epoch": 0.7734806629834254,
+      "grad_norm": 1.4794244766235352,
+      "learning_rate": 0.00014846295509278936,
+      "loss": 1.4403,
+      "step": 1820
+    },
+    {
+      "epoch": 0.7819804504887378,
+      "grad_norm": 1.3663828372955322,
+      "learning_rate": 0.0001478963025924352,
+      "loss": 1.5626,
+      "step": 1840
+    },
+    {
+      "epoch": 0.7904802379940501,
+      "grad_norm": 1.6889076232910156,
+      "learning_rate": 0.00014732965009208105,
+      "loss": 1.4974,
+      "step": 1860
+    },
+    {
+      "epoch": 0.7989800254993625,
+      "grad_norm": 1.4024620056152344,
+      "learning_rate": 0.00014676299759172688,
+      "loss": 1.5809,
+      "step": 1880
+    },
+    {
+      "epoch": 0.8074798130046749,
+      "grad_norm": 1.3137383460998535,
+      "learning_rate": 0.00014619634509137274,
+      "loss": 1.5202,
+      "step": 1900
+    },
+    {
+      "epoch": 0.8159796005099873,
+      "grad_norm": 1.5561782121658325,
+      "learning_rate": 0.00014562969259101856,
+      "loss": 1.4611,
+      "step": 1920
+    },
+    {
+      "epoch": 0.8244793880152996,
+      "grad_norm": 1.4386446475982666,
+      "learning_rate": 0.0001450630400906644,
+      "loss": 1.5364,
+      "step": 1940
+    },
+    {
+      "epoch": 0.832979175520612,
+      "grad_norm": 1.4223254919052124,
+      "learning_rate": 0.00014452472021532796,
+      "loss": 1.4866,
+      "step": 1960
+    },
+    {
+      "epoch": 0.8414789630259244,
+      "grad_norm": 1.290819525718689,
+      "learning_rate": 0.0001439580677149738,
+      "loss": 1.5182,
+      "step": 1980
+    },
+    {
+      "epoch": 0.8499787505312367,
+      "grad_norm": 1.3473018407821655,
+      "learning_rate": 0.00014339141521461964,
+      "loss": 1.4962,
+      "step": 2000
+    },
+    {
+      "epoch": 0.858478538036549,
+      "grad_norm": 1.480664849281311,
+      "learning_rate": 0.00014282476271426547,
+      "loss": 1.5277,
+      "step": 2020
+    },
+    {
+      "epoch": 0.8669783255418615,
+      "grad_norm": 1.4417237043380737,
+      "learning_rate": 0.00014225811021391133,
+      "loss": 1.4646,
+      "step": 2040
+    },
+    {
+      "epoch": 0.8754781130471738,
+      "grad_norm": 1.3464792966842651,
+      "learning_rate": 0.00014169145771355716,
+      "loss": 1.5028,
+      "step": 2060
+    },
+    {
+      "epoch": 0.8839779005524862,
+      "grad_norm": 1.4068652391433716,
+      "learning_rate": 0.000141124805213203,
+      "loss": 1.5142,
+      "step": 2080
+    },
+    {
+      "epoch": 0.8924776880577986,
+      "grad_norm": 1.321798324584961,
+      "learning_rate": 0.00014055815271284884,
+      "loss": 1.5505,
+      "step": 2100
+    },
+    {
+      "epoch": 0.9009774755631109,
+      "grad_norm": 1.38119375705719,
+      "learning_rate": 0.0001399915002124947,
+      "loss": 1.5606,
+      "step": 2120
+    },
+    {
+      "epoch": 0.9094772630684232,
+      "grad_norm": 1.4736709594726562,
+      "learning_rate": 0.00013942484771214053,
+      "loss": 1.5104,
+      "step": 2140
+    },
+    {
+      "epoch": 0.9179770505737357,
+      "grad_norm": 1.5708409547805786,
+      "learning_rate": 0.00013885819521178638,
+      "loss": 1.5129,
+      "step": 2160
+    },
+    {
+      "epoch": 0.926476838079048,
+      "grad_norm": 1.2119626998901367,
+      "learning_rate": 0.00013829154271143224,
+      "loss": 1.5139,
+      "step": 2180
+    },
+    {
+      "epoch": 0.9349766255843603,
+      "grad_norm": 1.4460374116897583,
+      "learning_rate": 0.00013772489021107807,
+      "loss": 1.5176,
+      "step": 2200
+    },
+    {
+      "epoch": 0.9434764130896728,
+      "grad_norm": 1.377457618713379,
+      "learning_rate": 0.00013715823771072392,
+      "loss": 1.4846,
+      "step": 2220
+    },
+    {
+      "epoch": 0.9519762005949851,
+      "grad_norm": 1.5439783334732056,
+      "learning_rate": 0.00013659158521036975,
+      "loss": 1.4432,
+      "step": 2240
+    },
+    {
+      "epoch": 0.9604759881002974,
+      "grad_norm": 1.3697561025619507,
+      "learning_rate": 0.00013602493271001558,
+      "loss": 1.5054,
+      "step": 2260
+    },
+    {
+      "epoch": 0.9689757756056099,
+      "grad_norm": 1.6576790809631348,
+      "learning_rate": 0.0001354582802096614,
+      "loss": 1.5202,
+      "step": 2280
+    },
+    {
+      "epoch": 0.9774755631109222,
+      "grad_norm": 1.2522470951080322,
+      "learning_rate": 0.00013489162770930727,
+      "loss": 1.5237,
+      "step": 2300
+    },
+    {
+      "epoch": 0.9859753506162346,
+      "grad_norm": 1.316643476486206,
+      "learning_rate": 0.0001343249752089531,
+      "loss": 1.4693,
+      "step": 2320
+    },
+    {
+      "epoch": 0.994475138121547,
+      "grad_norm": 1.415403127670288,
+      "learning_rate": 0.00013375832270859895,
+      "loss": 1.5064,
+      "step": 2340
+    },
+    {
+      "epoch": 1.0029749256268594,
+      "grad_norm": 1.2021781206130981,
+      "learning_rate": 0.00013319167020824478,
+      "loss": 1.4576,
+      "step": 2360
+    },
+    {
+      "epoch": 1.0114747131321717,
+      "grad_norm": 1.391981840133667,
+      "learning_rate": 0.00013262501770789064,
+      "loss": 1.4672,
+      "step": 2380
+    },
+    {
+      "epoch": 1.019974500637484,
+      "grad_norm": 1.6223762035369873,
+      "learning_rate": 0.0001320583652075365,
+      "loss": 1.4746,
+      "step": 2400
+    },
+    {
+      "epoch": 1.0284742881427964,
+      "grad_norm": 1.584978699684143,
+      "learning_rate": 0.00013149171270718233,
+      "loss": 1.446,
+      "step": 2420
+    },
+    {
+      "epoch": 1.0369740756481087,
+      "grad_norm": 1.3905613422393799,
+      "learning_rate": 0.00013092506020682818,
+      "loss": 1.4673,
+      "step": 2440
+    },
+    {
+      "epoch": 1.045473863153421,
+      "grad_norm": 1.3543156385421753,
+      "learning_rate": 0.000130358407706474,
+      "loss": 1.4774,
+      "step": 2460
+    },
+    {
+      "epoch": 1.0539736506587336,
+      "grad_norm": 1.4284417629241943,
+      "learning_rate": 0.00012979175520611987,
+      "loss": 1.4828,
+      "step": 2480
+    },
+    {
+      "epoch": 1.062473438164046,
+      "grad_norm": 1.2880805730819702,
+      "learning_rate": 0.0001292251027057657,
+      "loss": 1.4688,
+      "step": 2500
+    },
+    {
+      "epoch": 1.0709732256693583,
+      "grad_norm": 1.4029446840286255,
+      "learning_rate": 0.00012865845020541155,
+      "loss": 1.4667,
+      "step": 2520
+    },
+    {
+      "epoch": 1.0794730131746706,
+      "grad_norm": 1.4345691204071045,
+      "learning_rate": 0.00012809179770505738,
+      "loss": 1.4268,
+      "step": 2540
+    },
+    {
+      "epoch": 1.087972800679983,
+      "grad_norm": 1.5828334093093872,
+      "learning_rate": 0.0001275251452047032,
+      "loss": 1.3897,
+      "step": 2560
+    },
+    {
+      "epoch": 1.0964725881852955,
+      "grad_norm": 1.8496978282928467,
+      "learning_rate": 0.00012695849270434907,
+      "loss": 1.4783,
+      "step": 2580
+    },
+    {
+      "epoch": 1.1049723756906078,
+      "grad_norm": 1.756039023399353,
+      "learning_rate": 0.0001263918402039949,
+      "loss": 1.4611,
+      "step": 2600
+    },
+    {
+      "epoch": 1.1134721631959201,
+      "grad_norm": 1.3964674472808838,
+      "learning_rate": 0.00012582518770364075,
+      "loss": 1.4139,
+      "step": 2620
+    },
+    {
+      "epoch": 1.1219719507012325,
+      "grad_norm": 1.7207622528076172,
+      "learning_rate": 0.00012525853520328658,
+      "loss": 1.4206,
+      "step": 2640
+    },
+    {
+      "epoch": 1.1304717382065448,
+      "grad_norm": 1.4870052337646484,
+      "learning_rate": 0.00012469188270293244,
+      "loss": 1.3631,
+      "step": 2660
+    },
+    {
+      "epoch": 1.1389715257118571,
+      "grad_norm": 1.4430071115493774,
+      "learning_rate": 0.00012412523020257827,
+      "loss": 1.4766,
+      "step": 2680
+    },
+    {
+      "epoch": 1.1474713132171697,
+      "grad_norm": 1.5112409591674805,
+      "learning_rate": 0.00012355857770222412,
+      "loss": 1.461,
+      "step": 2700
+    },
+    {
+      "epoch": 1.155971100722482,
+      "grad_norm": 1.4189656972885132,
+      "learning_rate": 0.00012299192520186995,
+      "loss": 1.389,
+      "step": 2720
+    },
+    {
+      "epoch": 1.1644708882277943,
+      "grad_norm": 1.543331265449524,
+      "learning_rate": 0.0001224252727015158,
+      "loss": 1.4474,
+      "step": 2740
+    },
+    {
+      "epoch": 1.1729706757331066,
+      "grad_norm": 1.398582100868225,
+      "learning_rate": 0.00012185862020116164,
+      "loss": 1.4047,
+      "step": 2760
+    },
+    {
+      "epoch": 1.181470463238419,
+      "grad_norm": 1.3756153583526611,
+      "learning_rate": 0.00012129196770080748,
+      "loss": 1.4086,
+      "step": 2780
+    },
+    {
+      "epoch": 1.1899702507437313,
+      "grad_norm": 1.5835983753204346,
+      "learning_rate": 0.00012072531520045334,
+      "loss": 1.4855,
+      "step": 2800
+    },
+    {
+      "epoch": 1.1984700382490439,
+      "grad_norm": 1.516381025314331,
+      "learning_rate": 0.00012015866270009917,
+      "loss": 1.455,
+      "step": 2820
+    },
+    {
+      "epoch": 1.2069698257543562,
+      "grad_norm": 1.2615922689437866,
+      "learning_rate": 0.00011959201019974502,
+      "loss": 1.4531,
+      "step": 2840
+    },
+    {
+      "epoch": 1.2154696132596685,
+      "grad_norm": 1.5319349765777588,
+      "learning_rate": 0.00011902535769939085,
+      "loss": 1.4098,
+      "step": 2860
+    },
+    {
+      "epoch": 1.2239694007649808,
+      "grad_norm": 1.384494423866272,
+      "learning_rate": 0.0001184587051990367,
+      "loss": 1.424,
+      "step": 2880
+    },
+    {
+      "epoch": 1.2324691882702932,
+      "grad_norm": 1.4936223030090332,
+      "learning_rate": 0.00011789205269868254,
+      "loss": 1.4409,
+      "step": 2900
+    },
+    {
+      "epoch": 1.2409689757756057,
+      "grad_norm": 1.8717914819717407,
+      "learning_rate": 0.00011732540019832838,
+      "loss": 1.4478,
+      "step": 2920
+    },
+    {
+      "epoch": 1.249468763280918,
+      "grad_norm": 1.5673747062683105,
+      "learning_rate": 0.00011675874769797421,
+      "loss": 1.4076,
+      "step": 2940
+    },
+    {
+      "epoch": 1.2579685507862304,
+      "grad_norm": 1.5353143215179443,
+      "learning_rate": 0.00011619209519762007,
+      "loss": 1.4527,
+      "step": 2960
+    },
+    {
+      "epoch": 1.2664683382915427,
+      "grad_norm": 1.5840922594070435,
+      "learning_rate": 0.00011562544269726592,
+      "loss": 1.4547,
+      "step": 2980
+    },
+    {
+      "epoch": 1.274968125796855,
+      "grad_norm": 1.432485580444336,
+      "learning_rate": 0.00011505879019691175,
+      "loss": 1.3915,
+      "step": 3000
+    },
+    {
+      "epoch": 1.2834679133021676,
+      "grad_norm": 1.6025019884109497,
+      "learning_rate": 0.0001144921376965576,
+      "loss": 1.4778,
+      "step": 3020
+    },
+    {
+      "epoch": 1.2919677008074797,
+      "grad_norm": 1.446283221244812,
+      "learning_rate": 0.00011392548519620342,
+      "loss": 1.4232,
+      "step": 3040
+    },
+    {
+      "epoch": 1.3004674883127922,
+      "grad_norm": 1.770347237586975,
+      "learning_rate": 0.00011335883269584928,
+      "loss": 1.4471,
+      "step": 3060
+    },
+    {
+      "epoch": 1.3089672758181046,
+      "grad_norm": 1.6366223096847534,
+      "learning_rate": 0.00011279218019549511,
+      "loss": 1.3898,
+      "step": 3080
+    },
+    {
+      "epoch": 1.317467063323417,
+      "grad_norm": 1.6164251565933228,
+      "learning_rate": 0.00011222552769514096,
+      "loss": 1.4911,
+      "step": 3100
+    },
+    {
+      "epoch": 1.3259668508287292,
+      "grad_norm": 1.72907292842865,
+      "learning_rate": 0.0001116588751947868,
+      "loss": 1.4324,
+      "step": 3120
+    },
+    {
+      "epoch": 1.3344666383340416,
+      "grad_norm": 1.5645689964294434,
+      "learning_rate": 0.00011109222269443265,
+      "loss": 1.4571,
+      "step": 3140
+    },
+    {
+      "epoch": 1.342966425839354,
+      "grad_norm": 1.6170058250427246,
+      "learning_rate": 0.00011052557019407848,
+      "loss": 1.4016,
+      "step": 3160
+    },
+    {
+      "epoch": 1.3514662133446664,
+      "grad_norm": 1.806199312210083,
+      "learning_rate": 0.00010995891769372432,
+      "loss": 1.3691,
+      "step": 3180
+    },
+    {
+      "epoch": 1.3599660008499788,
+      "grad_norm": 1.3674649000167847,
+      "learning_rate": 0.00010939226519337018,
+      "loss": 1.3924,
+      "step": 3200
+    },
+    {
+      "epoch": 1.368465788355291,
+      "grad_norm": 1.7301322221755981,
+      "learning_rate": 0.00010882561269301601,
+      "loss": 1.4296,
+      "step": 3220
+    },
+    {
+      "epoch": 1.3769655758606034,
+      "grad_norm": 1.4442013502120972,
+      "learning_rate": 0.00010825896019266186,
+      "loss": 1.4403,
+      "step": 3240
+    },
+    {
+      "epoch": 1.385465363365916,
+      "grad_norm": 1.838722586631775,
+      "learning_rate": 0.0001076923076923077,
+      "loss": 1.4345,
+      "step": 3260
+    },
+    {
+      "epoch": 1.3939651508712283,
+      "grad_norm": 1.4899051189422607,
+      "learning_rate": 0.00010712565519195355,
+      "loss": 1.4075,
+      "step": 3280
+    },
+    {
+      "epoch": 1.4024649383765406,
+      "grad_norm": 1.5684807300567627,
+      "learning_rate": 0.00010655900269159938,
+      "loss": 1.3676,
+      "step": 3300
+    },
+    {
+      "epoch": 1.410964725881853,
+      "grad_norm": 1.5851366519927979,
+      "learning_rate": 0.00010599235019124522,
+      "loss": 1.4093,
+      "step": 3320
+    },
+    {
+      "epoch": 1.4194645133871653,
+      "grad_norm": 1.5306929349899292,
+      "learning_rate": 0.00010542569769089105,
+      "loss": 1.4281,
+      "step": 3340
+    },
+    {
+      "epoch": 1.4279643008924776,
+      "grad_norm": 1.798202633857727,
+      "learning_rate": 0.00010485904519053691,
+      "loss": 1.405,
+      "step": 3360
+    },
+    {
+      "epoch": 1.43646408839779,
+      "grad_norm": 1.725794792175293,
+      "learning_rate": 0.00010429239269018276,
+      "loss": 1.4055,
+      "step": 3380
+    },
+    {
+      "epoch": 1.4449638759031025,
+      "grad_norm": 1.4544621706008911,
+      "learning_rate": 0.00010372574018982859,
+      "loss": 1.4608,
+      "step": 3400
+    },
+    {
+      "epoch": 1.4534636634084148,
+      "grad_norm": 1.6607258319854736,
+      "learning_rate": 0.00010315908768947445,
+      "loss": 1.3565,
+      "step": 3420
+    },
+    {
+      "epoch": 1.4619634509137271,
+      "grad_norm": 1.3560140132904053,
+      "learning_rate": 0.00010259243518912028,
+      "loss": 1.4086,
+      "step": 3440
+    },
+    {
+      "epoch": 1.4704632384190395,
+      "grad_norm": 1.5847728252410889,
+      "learning_rate": 0.00010202578268876612,
+      "loss": 1.4006,
+      "step": 3460
+    },
+    {
+      "epoch": 1.4789630259243518,
+      "grad_norm": 1.555255651473999,
+      "learning_rate": 0.00010145913018841195,
+      "loss": 1.4639,
+      "step": 3480
+    },
+    {
+      "epoch": 1.4874628134296644,
+      "grad_norm": 1.389131784439087,
+      "learning_rate": 0.0001008924776880578,
+      "loss": 1.4167,
+      "step": 3500
+    },
+    {
+      "epoch": 1.4959626009349767,
+      "grad_norm": 1.435861349105835,
+      "learning_rate": 0.00010032582518770364,
+      "loss": 1.4018,
+      "step": 3520
+    },
+    {
+      "epoch": 1.504462388440289,
+      "grad_norm": 2.1325809955596924,
+      "learning_rate": 9.975917268734949e-05,
+      "loss": 1.4261,
+      "step": 3540
+    },
+    {
+      "epoch": 1.5129621759456013,
+      "grad_norm": 1.6307079792022705,
+      "learning_rate": 9.919252018699533e-05,
+      "loss": 1.4596,
+      "step": 3560
+    },
+    {
+      "epoch": 1.5214619634509137,
+      "grad_norm": 1.5455667972564697,
+      "learning_rate": 9.862586768664118e-05,
+      "loss": 1.4111,
+      "step": 3580
+    },
+    {
+      "epoch": 1.5299617509562262,
+      "grad_norm": 1.3528661727905273,
+      "learning_rate": 9.8059215186287e-05,
+      "loss": 1.4634,
+      "step": 3600
+    },
+    {
+      "epoch": 1.5384615384615383,
+      "grad_norm": 1.477866768836975,
+      "learning_rate": 9.749256268593285e-05,
+      "loss": 1.503,
+      "step": 3620
+    },
+    {
+      "epoch": 1.5469613259668509,
+      "grad_norm": 1.5204942226409912,
+      "learning_rate": 9.692591018557869e-05,
+      "loss": 1.4811,
+      "step": 3640
+    },
+    {
+      "epoch": 1.5554611134721632,
+      "grad_norm": 1.5966800451278687,
+      "learning_rate": 9.635925768522453e-05,
+      "loss": 1.4768,
+      "step": 3660
+    },
+    {
+      "epoch": 1.5639609009774755,
+      "grad_norm": 1.538609504699707,
+      "learning_rate": 9.579260518487039e-05,
+      "loss": 1.4379,
+      "step": 3680
+    },
+    {
+      "epoch": 1.572460688482788,
+      "grad_norm": 1.6952332258224487,
+      "learning_rate": 9.522595268451623e-05,
+      "loss": 1.4503,
+      "step": 3700
+    },
+    {
+      "epoch": 1.5809604759881002,
+      "grad_norm": 1.4433083534240723,
+      "learning_rate": 9.465930018416208e-05,
+      "loss": 1.4461,
+      "step": 3720
+    },
+    {
+      "epoch": 1.5894602634934127,
+      "grad_norm": 1.6331605911254883,
+      "learning_rate": 9.40926476838079e-05,
+      "loss": 1.4212,
+      "step": 3740
+    },
+    {
+      "epoch": 1.597960050998725,
+      "grad_norm": 2.1244919300079346,
+      "learning_rate": 9.352599518345375e-05,
+      "loss": 1.3775,
+      "step": 3760
+    },
+    {
+      "epoch": 1.6064598385040374,
+      "grad_norm": 1.6804299354553223,
+      "learning_rate": 9.295934268309959e-05,
+      "loss": 1.4365,
+      "step": 3780
+    },
+    {
+      "epoch": 1.6149596260093497,
+      "grad_norm": 1.5793712139129639,
+      "learning_rate": 9.239269018274543e-05,
+      "loss": 1.3996,
+      "step": 3800
+    },
+    {
+      "epoch": 1.623459413514662,
+      "grad_norm": 1.646560788154602,
+      "learning_rate": 9.182603768239128e-05,
+      "loss": 1.4735,
+      "step": 3820
+    },
+    {
+      "epoch": 1.6319592010199746,
+      "grad_norm": 1.6244877576828003,
+      "learning_rate": 9.125938518203712e-05,
+      "loss": 1.4646,
+      "step": 3840
+    },
+    {
+      "epoch": 1.6404589885252867,
+      "grad_norm": 1.4168405532836914,
+      "learning_rate": 9.069273268168296e-05,
+      "loss": 1.3962,
+      "step": 3860
+    },
+    {
+      "epoch": 1.6489587760305993,
+      "grad_norm": 1.4018287658691406,
+      "learning_rate": 9.01260801813288e-05,
+      "loss": 1.3747,
+      "step": 3880
+    },
+    {
+      "epoch": 1.6574585635359116,
+      "grad_norm": 1.3393577337265015,
+      "learning_rate": 8.955942768097465e-05,
+      "loss": 1.3644,
+      "step": 3900
+    },
+    {
+      "epoch": 1.665958351041224,
+      "grad_norm": 1.5588535070419312,
+      "learning_rate": 8.899277518062049e-05,
+      "loss": 1.417,
+      "step": 3920
+    },
+    {
+      "epoch": 1.6744581385465365,
+      "grad_norm": 1.4518215656280518,
+      "learning_rate": 8.842612268026633e-05,
+      "loss": 1.4177,
+      "step": 3940
+    },
+    {
+      "epoch": 1.6829579260518486,
+      "grad_norm": 1.593959093093872,
+      "learning_rate": 8.785947017991218e-05,
+      "loss": 1.4692,
+      "step": 3960
+    },
+    {
+      "epoch": 1.6914577135571611,
+      "grad_norm": 1.61430025100708,
+      "learning_rate": 8.729281767955802e-05,
+      "loss": 1.4231,
+      "step": 3980
+    },
+    {
+      "epoch": 1.6999575010624735,
+      "grad_norm": 1.5006210803985596,
+      "learning_rate": 8.672616517920386e-05,
+      "loss": 1.4632,
+      "step": 4000
+    },
+    {
+      "epoch": 1.7084572885677858,
+      "grad_norm": 1.5484602451324463,
+      "learning_rate": 8.615951267884969e-05,
+      "loss": 1.4282,
+      "step": 4020
+    },
+    {
+      "epoch": 1.7169570760730983,
+      "grad_norm": 1.667822003364563,
+      "learning_rate": 8.559286017849553e-05,
+      "loss": 1.3641,
+      "step": 4040
+    },
+    {
+      "epoch": 1.7254568635784104,
+      "grad_norm": 1.6226531267166138,
+      "learning_rate": 8.502620767814138e-05,
+      "loss": 1.4197,
+      "step": 4060
+    },
+    {
+      "epoch": 1.733956651083723,
+      "grad_norm": 2.0089526176452637,
+      "learning_rate": 8.445955517778723e-05,
+      "loss": 1.4129,
+      "step": 4080
+    },
+    {
+      "epoch": 1.7424564385890353,
+      "grad_norm": 1.3930327892303467,
+      "learning_rate": 8.389290267743307e-05,
+      "loss": 1.3614,
+      "step": 4100
+    },
+    {
+      "epoch": 1.7509562260943476,
+      "grad_norm": 1.890655755996704,
+      "learning_rate": 8.332625017707892e-05,
+      "loss": 1.4578,
+      "step": 4120
+    },
+    {
+      "epoch": 1.75945601359966,
+      "grad_norm": 1.7144534587860107,
+      "learning_rate": 8.275959767672476e-05,
+      "loss": 1.4148,
+      "step": 4140
+    },
+    {
+      "epoch": 1.7679558011049723,
+      "grad_norm": 1.5091826915740967,
+      "learning_rate": 8.219294517637059e-05,
+      "loss": 1.4203,
+      "step": 4160
+    },
+    {
+      "epoch": 1.7764555886102849,
+      "grad_norm": 1.7839044332504272,
+      "learning_rate": 8.162629267601643e-05,
+      "loss": 1.3824,
+      "step": 4180
+    },
+    {
+      "epoch": 1.784955376115597,
+      "grad_norm": 1.661871075630188,
+      "learning_rate": 8.105964017566228e-05,
+      "loss": 1.4356,
+      "step": 4200
+    },
+    {
+      "epoch": 1.7934551636209095,
+      "grad_norm": 1.4366284608840942,
+      "learning_rate": 8.049298767530812e-05,
+      "loss": 1.3839,
+      "step": 4220
+    },
+    {
+      "epoch": 1.8019549511262218,
+      "grad_norm": 1.5518572330474854,
+      "learning_rate": 7.992633517495396e-05,
+      "loss": 1.4134,
+      "step": 4240
+    },
+    {
+      "epoch": 1.8104547386315342,
+      "grad_norm": 1.652972936630249,
+      "learning_rate": 7.93596826745998e-05,
+      "loss": 1.4234,
+      "step": 4260
+    },
+    {
+      "epoch": 1.8189545261368467,
+      "grad_norm": 1.4948612451553345,
+      "learning_rate": 7.879303017424565e-05,
+      "loss": 1.3798,
+      "step": 4280
+    },
+    {
+      "epoch": 1.8274543136421588,
+      "grad_norm": 1.6756293773651123,
+      "learning_rate": 7.822637767389149e-05,
+      "loss": 1.4239,
+      "step": 4300
+    },
+    {
+      "epoch": 1.8359541011474714,
+      "grad_norm": 1.7066694498062134,
+      "learning_rate": 7.765972517353733e-05,
+      "loss": 1.4485,
+      "step": 4320
+    },
+    {
+      "epoch": 1.8444538886527837,
+      "grad_norm": 1.478449821472168,
+      "learning_rate": 7.709307267318317e-05,
+      "loss": 1.3998,
+      "step": 4340
+    },
+    {
+      "epoch": 1.852953676158096,
+      "grad_norm": 1.7076212167739868,
+      "learning_rate": 7.652642017282902e-05,
+      "loss": 1.4249,
+      "step": 4360
+    },
+    {
+      "epoch": 1.8614534636634086,
+      "grad_norm": 1.5418049097061157,
+      "learning_rate": 7.595976767247486e-05,
+      "loss": 1.3806,
+      "step": 4380
+    },
+    {
+      "epoch": 1.8699532511687207,
+      "grad_norm": 1.668404459953308,
+      "learning_rate": 7.53931151721207e-05,
+      "loss": 1.4088,
+      "step": 4400
+    },
+    {
+      "epoch": 1.8784530386740332,
+      "grad_norm": 2.0103743076324463,
+      "learning_rate": 7.482646267176655e-05,
+      "loss": 1.4432,
+      "step": 4420
+    },
+    {
+      "epoch": 1.8869528261793456,
+      "grad_norm": 1.8902521133422852,
+      "learning_rate": 7.425981017141237e-05,
+      "loss": 1.4262,
+      "step": 4440
+    },
+    {
+      "epoch": 1.895452613684658,
+      "grad_norm": 1.493699550628662,
+      "learning_rate": 7.369315767105822e-05,
+      "loss": 1.402,
+      "step": 4460
+    },
+    {
+      "epoch": 1.9039524011899702,
+      "grad_norm": 1.617872953414917,
+      "learning_rate": 7.312650517070407e-05,
+      "loss": 1.4409,
+      "step": 4480
+    },
+    {
+      "epoch": 1.9124521886952826,
+      "grad_norm": 1.5638338327407837,
+      "learning_rate": 7.255985267034992e-05,
+      "loss": 1.3933,
+      "step": 4500
+    },
+    {
+      "epoch": 1.920951976200595,
+      "grad_norm": 1.6773780584335327,
+      "learning_rate": 7.199320016999576e-05,
+      "loss": 1.4084,
+      "step": 4520
+    },
+    {
+      "epoch": 1.9294517637059072,
+      "grad_norm": 1.5437266826629639,
+      "learning_rate": 7.14265476696416e-05,
+      "loss": 1.4419,
+      "step": 4540
+    },
+    {
+      "epoch": 1.9379515512112198,
+      "grad_norm": 1.5624651908874512,
+      "learning_rate": 7.085989516928744e-05,
+      "loss": 1.3479,
+      "step": 4560
+    },
+    {
+      "epoch": 1.946451338716532,
+      "grad_norm": 1.594768762588501,
+      "learning_rate": 7.029324266893327e-05,
+      "loss": 1.4021,
+      "step": 4580
+    },
+    {
+      "epoch": 1.9549511262218444,
+      "grad_norm": 1.7385071516036987,
+      "learning_rate": 6.972659016857912e-05,
+      "loss": 1.3459,
+      "step": 4600
+    },
+    {
+      "epoch": 1.963450913727157,
+      "grad_norm": 1.835210919380188,
+      "learning_rate": 6.915993766822496e-05,
+      "loss": 1.4334,
+      "step": 4620
+    },
+    {
+      "epoch": 1.971950701232469,
+      "grad_norm": 1.3983690738677979,
+      "learning_rate": 6.85932851678708e-05,
+      "loss": 1.396,
+      "step": 4640
+    },
+    {
+      "epoch": 1.9804504887377816,
+      "grad_norm": 1.4692546129226685,
+      "learning_rate": 6.802663266751664e-05,
+      "loss": 1.4184,
+      "step": 4660
+    },
+    {
+      "epoch": 1.988950276243094,
+      "grad_norm": 2.199734926223755,
+      "learning_rate": 6.74599801671625e-05,
+      "loss": 1.3902,
+      "step": 4680
+    },
+    {
+      "epoch": 1.9974500637484063,
+      "grad_norm": 1.8094977140426636,
+      "learning_rate": 6.689332766680833e-05,
+      "loss": 1.3973,
+      "step": 4700
+    },
+    {
+      "epoch": 2.005949851253719,
+      "grad_norm": 1.4437006711959839,
+      "learning_rate": 6.632667516645417e-05,
+      "loss": 1.3595,
+      "step": 4720
+    },
+    {
+      "epoch": 2.014449638759031,
+      "grad_norm": 1.6683040857315063,
+      "learning_rate": 6.576002266610002e-05,
+      "loss": 1.3589,
+      "step": 4740
+    },
+    {
+      "epoch": 2.0229494262643435,
+      "grad_norm": 1.5770437717437744,
+      "learning_rate": 6.519337016574586e-05,
+      "loss": 1.3924,
+      "step": 4760
+    },
+    {
+      "epoch": 2.0314492137696556,
+      "grad_norm": 1.6709238290786743,
+      "learning_rate": 6.465505029040941e-05,
+      "loss": 1.385,
+      "step": 4780
+    },
+    {
+      "epoch": 2.039949001274968,
+      "grad_norm": 1.4806042909622192,
+      "learning_rate": 6.408839779005525e-05,
+      "loss": 1.3821,
+      "step": 4800
+    },
+    {
+      "epoch": 2.0484487887802807,
+      "grad_norm": 1.9815375804901123,
+      "learning_rate": 6.352174528970109e-05,
+      "loss": 1.3698,
+      "step": 4820
+    },
+    {
+      "epoch": 2.056948576285593,
+      "grad_norm": 1.5569688081741333,
+      "learning_rate": 6.295509278934694e-05,
+      "loss": 1.2949,
+      "step": 4840
+    },
+    {
+      "epoch": 2.0654483637909054,
+      "grad_norm": 1.7990926504135132,
+      "learning_rate": 6.238844028899278e-05,
+      "loss": 1.3434,
+      "step": 4860
+    },
+    {
+      "epoch": 2.0739481512962175,
+      "grad_norm": 1.6309067010879517,
+      "learning_rate": 6.182178778863862e-05,
+      "loss": 1.3536,
+      "step": 4880
+    },
+    {
+      "epoch": 2.08244793880153,
+      "grad_norm": 1.7287702560424805,
+      "learning_rate": 6.125513528828446e-05,
+      "loss": 1.3648,
+      "step": 4900
+    },
+    {
+      "epoch": 2.090947726306842,
+      "grad_norm": 1.6067641973495483,
+      "learning_rate": 6.06884827879303e-05,
+      "loss": 1.3003,
+      "step": 4920
+    },
+    {
+      "epoch": 2.0994475138121547,
+      "grad_norm": 2.101600408554077,
+      "learning_rate": 6.012183028757614e-05,
+      "loss": 1.3172,
+      "step": 4940
+    },
+    {
+      "epoch": 2.107947301317467,
+      "grad_norm": 1.7530735731124878,
+      "learning_rate": 5.955517778722199e-05,
+      "loss": 1.3195,
+      "step": 4960
+    },
+    {
+      "epoch": 2.1164470888227793,
+      "grad_norm": 1.5520099401474,
+      "learning_rate": 5.8988525286867834e-05,
+      "loss": 1.2941,
+      "step": 4980
+    },
+    {
+      "epoch": 2.124946876328092,
+      "grad_norm": 1.8086482286453247,
+      "learning_rate": 5.842187278651368e-05,
+      "loss": 1.3591,
+      "step": 5000
+    },
+    {
+      "epoch": 2.133446663833404,
+      "grad_norm": 1.7753677368164062,
+      "learning_rate": 5.785522028615952e-05,
+      "loss": 1.335,
+      "step": 5020
+    },
+    {
+      "epoch": 2.1419464513387165,
+      "grad_norm": 1.9112579822540283,
+      "learning_rate": 5.7288567785805356e-05,
+      "loss": 1.3363,
+      "step": 5040
+    },
+    {
+      "epoch": 2.150446238844029,
+      "grad_norm": 2.093616247177124,
+      "learning_rate": 5.67219152854512e-05,
+      "loss": 1.3008,
+      "step": 5060
+    },
+    {
+      "epoch": 2.158946026349341,
+      "grad_norm": 1.6216552257537842,
+      "learning_rate": 5.615526278509704e-05,
+      "loss": 1.3529,
+      "step": 5080
+    },
+    {
+      "epoch": 2.1674458138546537,
+      "grad_norm": 1.714783787727356,
+      "learning_rate": 5.5588610284742884e-05,
+      "loss": 1.3847,
+      "step": 5100
+    },
+    {
+      "epoch": 2.175945601359966,
+      "grad_norm": 2.0024187564849854,
+      "learning_rate": 5.502195778438872e-05,
+      "loss": 1.3623,
+      "step": 5120
+    },
+    {
+      "epoch": 2.1844453888652784,
+      "grad_norm": 1.5948599576950073,
+      "learning_rate": 5.445530528403456e-05,
+      "loss": 1.3499,
+      "step": 5140
+    },
+    {
+      "epoch": 2.192945176370591,
+      "grad_norm": 1.862939715385437,
+      "learning_rate": 5.388865278368042e-05,
+      "loss": 1.3231,
+      "step": 5160
+    },
+    {
+      "epoch": 2.201444963875903,
+      "grad_norm": 1.623194932937622,
+      "learning_rate": 5.3322000283326255e-05,
+      "loss": 1.3635,
+      "step": 5180
+    },
+    {
+      "epoch": 2.2099447513812156,
+      "grad_norm": 1.5168296098709106,
+      "learning_rate": 5.27553477829721e-05,
+      "loss": 1.2642,
+      "step": 5200
+    },
+    {
+      "epoch": 2.2184445388865277,
+      "grad_norm": 1.7383768558502197,
+      "learning_rate": 5.218869528261794e-05,
+      "loss": 1.3172,
+      "step": 5220
+    },
+    {
+      "epoch": 2.2269443263918403,
+      "grad_norm": 1.637037754058838,
+      "learning_rate": 5.162204278226378e-05,
+      "loss": 1.4083,
+      "step": 5240
+    },
+    {
+      "epoch": 2.2354441138971524,
+      "grad_norm": 2.1732664108276367,
+      "learning_rate": 5.105539028190962e-05,
+      "loss": 1.3312,
+      "step": 5260
+    },
+    {
+      "epoch": 2.243943901402465,
+      "grad_norm": 1.9546133279800415,
+      "learning_rate": 5.048873778155546e-05,
+      "loss": 1.4041,
+      "step": 5280
+    },
+    {
+      "epoch": 2.2524436889077775,
+      "grad_norm": 1.623448133468628,
+      "learning_rate": 4.992208528120131e-05,
+      "loss": 1.3266,
+      "step": 5300
+    },
+    {
+      "epoch": 2.2609434764130896,
+      "grad_norm": 1.6076593399047852,
+      "learning_rate": 4.935543278084715e-05,
+      "loss": 1.34,
+      "step": 5320
+    },
+    {
+      "epoch": 2.269443263918402,
+      "grad_norm": 1.7159696817398071,
+      "learning_rate": 4.878878028049299e-05,
+      "loss": 1.3483,
+      "step": 5340
+    },
+    {
+      "epoch": 2.2779430514237142,
+      "grad_norm": 2.0286951065063477,
+      "learning_rate": 4.822212778013883e-05,
+      "loss": 1.3935,
+      "step": 5360
+    },
+    {
+      "epoch": 2.2864428389290268,
+      "grad_norm": 1.7484012842178345,
+      "learning_rate": 4.7655475279784676e-05,
+      "loss": 1.2851,
+      "step": 5380
+    },
+    {
+      "epoch": 2.2949426264343393,
+      "grad_norm": 1.913295030593872,
+      "learning_rate": 4.708882277943052e-05,
+      "loss": 1.3141,
+      "step": 5400
+    },
+    {
+      "epoch": 2.3034424139396514,
+      "grad_norm": 1.3480751514434814,
+      "learning_rate": 4.652217027907636e-05,
+      "loss": 1.418,
+      "step": 5420
+    },
+    {
+      "epoch": 2.311942201444964,
+      "grad_norm": 1.6764460802078247,
+      "learning_rate": 4.5955517778722204e-05,
+      "loss": 1.3528,
+      "step": 5440
+    },
+    {
+      "epoch": 2.320441988950276,
+      "grad_norm": 2.255345106124878,
+      "learning_rate": 4.538886527836804e-05,
+      "loss": 1.3452,
+      "step": 5460
+    },
+    {
+      "epoch": 2.3289417764555886,
+      "grad_norm": 1.7120798826217651,
+      "learning_rate": 4.482221277801388e-05,
+      "loss": 1.416,
+      "step": 5480
+    },
+    {
+      "epoch": 2.337441563960901,
+      "grad_norm": 1.5951834917068481,
+      "learning_rate": 4.425556027765973e-05,
+      "loss": 1.4115,
+      "step": 5500
+    },
+    {
+      "epoch": 2.3459413514662133,
+      "grad_norm": 1.9381834268569946,
+      "learning_rate": 4.368890777730557e-05,
+      "loss": 1.338,
+      "step": 5520
+    },
+    {
+      "epoch": 2.354441138971526,
+      "grad_norm": 1.6465120315551758,
+      "learning_rate": 4.312225527695141e-05,
+      "loss": 1.3652,
+      "step": 5540
+    },
+    {
+      "epoch": 2.362940926476838,
+      "grad_norm": 1.5020956993103027,
+      "learning_rate": 4.2555602776597253e-05,
+      "loss": 1.3526,
+      "step": 5560
+    },
+    {
+      "epoch": 2.3714407139821505,
+      "grad_norm": 1.6624023914337158,
+      "learning_rate": 4.1988950276243096e-05,
+      "loss": 1.3685,
+      "step": 5580
+    },
+    {
+      "epoch": 2.3799405014874626,
+      "grad_norm": 1.9438233375549316,
+      "learning_rate": 4.142229777588894e-05,
+      "loss": 1.3812,
+      "step": 5600
+    },
+    {
+      "epoch": 2.388440288992775,
+      "grad_norm": 1.6776275634765625,
+      "learning_rate": 4.085564527553478e-05,
+      "loss": 1.3558,
+      "step": 5620
+    },
+    {
+      "epoch": 2.3969400764980877,
+      "grad_norm": 1.7344412803649902,
+      "learning_rate": 4.0288992775180624e-05,
+      "loss": 1.3472,
+      "step": 5640
+    },
+    {
+      "epoch": 2.4054398640034,
+      "grad_norm": 1.5919348001480103,
+      "learning_rate": 3.972234027482647e-05,
+      "loss": 1.3805,
+      "step": 5660
+    },
+    {
+      "epoch": 2.4139396515087124,
+      "grad_norm": 1.8995840549468994,
+      "learning_rate": 3.91556877744723e-05,
+      "loss": 1.3703,
+      "step": 5680
+    },
+    {
+      "epoch": 2.4224394390140245,
+      "grad_norm": 1.4829695224761963,
+      "learning_rate": 3.858903527411815e-05,
+      "loss": 1.4167,
+      "step": 5700
+    },
+    {
+      "epoch": 2.430939226519337,
+      "grad_norm": 1.818079948425293,
+      "learning_rate": 3.8022382773763995e-05,
+      "loss": 1.3363,
+      "step": 5720
+    },
+    {
+      "epoch": 2.4394390140246496,
+      "grad_norm": 1.7308282852172852,
+      "learning_rate": 3.745573027340983e-05,
+      "loss": 1.3286,
+      "step": 5740
+    },
+    {
+      "epoch": 2.4479388015299617,
+      "grad_norm": 1.7932735681533813,
+      "learning_rate": 3.6889077773055674e-05,
+      "loss": 1.389,
+      "step": 5760
+    },
+    {
+      "epoch": 2.4564385890352742,
+      "grad_norm": 2.104896068572998,
+      "learning_rate": 3.632242527270152e-05,
+      "loss": 1.33,
+      "step": 5780
+    },
+    {
+      "epoch": 2.4649383765405863,
+      "grad_norm": 1.7119940519332886,
+      "learning_rate": 3.575577277234736e-05,
+      "loss": 1.3614,
+      "step": 5800
+    },
+    {
+      "epoch": 2.473438164045899,
+      "grad_norm": 1.7597520351409912,
+      "learning_rate": 3.51891202719932e-05,
+      "loss": 1.3376,
+      "step": 5820
+    },
+    {
+      "epoch": 2.4819379515512114,
+      "grad_norm": 1.5762982368469238,
+      "learning_rate": 3.4622467771639045e-05,
+      "loss": 1.3829,
+      "step": 5840
+    },
+    {
+      "epoch": 2.4904377390565235,
+      "grad_norm": 1.8265308141708374,
+      "learning_rate": 3.405581527128489e-05,
+      "loss": 1.3436,
+      "step": 5860
+    },
+    {
+      "epoch": 2.498937526561836,
+      "grad_norm": 1.70365571975708,
+      "learning_rate": 3.3489162770930724e-05,
+      "loss": 1.3255,
+      "step": 5880
+    },
+    {
+      "epoch": 2.507437314067148,
+      "grad_norm": 2.1082663536071777,
+      "learning_rate": 3.292251027057657e-05,
+      "loss": 1.3892,
+      "step": 5900
+    },
+    {
+      "epoch": 2.5159371015724608,
+      "grad_norm": 1.9773534536361694,
+      "learning_rate": 3.2355857770222416e-05,
+      "loss": 1.3376,
+      "step": 5920
+    },
+    {
+      "epoch": 2.524436889077773,
+      "grad_norm": 1.7395081520080566,
+      "learning_rate": 3.178920526986825e-05,
+      "loss": 1.4138,
+      "step": 5940
+    },
+    {
+      "epoch": 2.5329366765830854,
+      "grad_norm": 1.7831342220306396,
+      "learning_rate": 3.1222552769514095e-05,
+      "loss": 1.3325,
+      "step": 5960
+    },
+    {
+      "epoch": 2.541436464088398,
+      "grad_norm": 1.4583061933517456,
+      "learning_rate": 3.065590026915994e-05,
+      "loss": 1.3131,
+      "step": 5980
+    },
+    {
+      "epoch": 2.54993625159371,
+      "grad_norm": 1.854566216468811,
+      "learning_rate": 3.0089247768805784e-05,
+      "loss": 1.3243,
+      "step": 6000
+    },
+    {
+      "epoch": 2.5584360390990226,
+      "grad_norm": 1.6933608055114746,
+      "learning_rate": 2.9522595268451626e-05,
+      "loss": 1.3649,
+      "step": 6020
+    },
+    {
+      "epoch": 2.566935826604335,
+      "grad_norm": 1.571779727935791,
+      "learning_rate": 2.8955942768097466e-05,
+      "loss": 1.3198,
+      "step": 6040
+    },
+    {
+      "epoch": 2.5754356141096473,
+      "grad_norm": 1.7845191955566406,
+      "learning_rate": 2.838929026774331e-05,
+      "loss": 1.3612,
+      "step": 6060
+    },
+    {
+      "epoch": 2.5839354016149594,
+      "grad_norm": 1.586799144744873,
+      "learning_rate": 2.7822637767389148e-05,
+      "loss": 1.3779,
+      "step": 6080
+    },
+    {
+      "epoch": 2.592435189120272,
+      "grad_norm": 1.7283703088760376,
+      "learning_rate": 2.7255985267034994e-05,
+      "loss": 1.4086,
+      "step": 6100
+    },
+    {
+      "epoch": 2.6009349766255845,
+      "grad_norm": 1.6855144500732422,
+      "learning_rate": 2.6689332766680837e-05,
+      "loss": 1.332,
+      "step": 6120
+    },
+    {
+      "epoch": 2.6094347641308966,
+      "grad_norm": 1.6331347227096558,
+      "learning_rate": 2.6122680266326676e-05,
+      "loss": 1.3825,
+      "step": 6140
+    },
+    {
+      "epoch": 2.617934551636209,
+      "grad_norm": 1.776105523109436,
+      "learning_rate": 2.555602776597252e-05,
+      "loss": 1.3828,
+      "step": 6160
+    },
+    {
+      "epoch": 2.6264343391415217,
+      "grad_norm": 1.9031463861465454,
+      "learning_rate": 2.498937526561836e-05,
+      "loss": 1.3507,
+      "step": 6180
+    },
+    {
+      "epoch": 2.634934126646834,
+      "grad_norm": 1.746543526649475,
+      "learning_rate": 2.4422722765264204e-05,
+      "loss": 1.3965,
+      "step": 6200
+    },
+    {
+      "epoch": 2.6434339141521463,
+      "grad_norm": 1.577978491783142,
+      "learning_rate": 2.3856070264910044e-05,
+      "loss": 1.3613,
+      "step": 6220
+    },
+    {
+      "epoch": 2.6519337016574585,
+      "grad_norm": 2.0596683025360107,
+      "learning_rate": 2.3289417764555886e-05,
+      "loss": 1.3259,
+      "step": 6240
+    },
+    {
+      "epoch": 2.660433489162771,
+      "grad_norm": 1.6416441202163696,
+      "learning_rate": 2.272276526420173e-05,
+      "loss": 1.3975,
+      "step": 6260
+    },
+    {
+      "epoch": 2.668933276668083,
+      "grad_norm": 1.653041958808899,
+      "learning_rate": 2.2156112763847572e-05,
+      "loss": 1.3281,
+      "step": 6280
+    },
+    {
+      "epoch": 2.6774330641733957,
+      "grad_norm": 1.6398296356201172,
+      "learning_rate": 2.1589460263493415e-05,
+      "loss": 1.3857,
+      "step": 6300
+    },
+    {
+      "epoch": 2.685932851678708,
+      "grad_norm": 1.722346305847168,
+      "learning_rate": 2.1022807763139254e-05,
+      "loss": 1.3459,
+      "step": 6320
+    },
+    {
+      "epoch": 2.6944326391840203,
+      "grad_norm": 1.6148895025253296,
+      "learning_rate": 2.0456155262785097e-05,
+      "loss": 1.4212,
+      "step": 6340
+    },
+    {
+      "epoch": 2.702932426689333,
+      "grad_norm": 1.6951065063476562,
+      "learning_rate": 1.988950276243094e-05,
+      "loss": 1.3228,
+      "step": 6360
+    },
+    {
+      "epoch": 2.711432214194645,
+      "grad_norm": 1.6303852796554565,
+      "learning_rate": 1.9322850262076782e-05,
+      "loss": 1.3422,
+      "step": 6380
+    },
+    {
+      "epoch": 2.7199320016999575,
+      "grad_norm": 1.540625810623169,
+      "learning_rate": 1.8756197761722625e-05,
+      "loss": 1.33,
+      "step": 6400
+    },
+    {
+      "epoch": 2.7284317892052696,
+      "grad_norm": 1.9898580312728882,
+      "learning_rate": 1.8189545261368468e-05,
+      "loss": 1.4018,
+      "step": 6420
+    },
+    {
+      "epoch": 2.736931576710582,
+      "grad_norm": 1.809574007987976,
+      "learning_rate": 1.762289276101431e-05,
+      "loss": 1.3645,
+      "step": 6440
+    },
+    {
+      "epoch": 2.7454313642158947,
+      "grad_norm": 1.7633795738220215,
+      "learning_rate": 1.705624026066015e-05,
+      "loss": 1.2959,
+      "step": 6460
+    },
+    {
+      "epoch": 2.753931151721207,
+      "grad_norm": 1.8831983804702759,
+      "learning_rate": 1.6489587760305992e-05,
+      "loss": 1.3563,
+      "step": 6480
+    },
+    {
+      "epoch": 2.7624309392265194,
+      "grad_norm": 1.7189044952392578,
+      "learning_rate": 1.5922935259951835e-05,
+      "loss": 1.3696,
+      "step": 6500
+    },
+    {
+      "epoch": 2.770930726731832,
+      "grad_norm": 2.021745204925537,
+      "learning_rate": 1.5356282759597678e-05,
+      "loss": 1.3032,
+      "step": 6520
+    },
+    {
+      "epoch": 2.779430514237144,
+      "grad_norm": 1.7517627477645874,
+      "learning_rate": 1.478963025924352e-05,
+      "loss": 1.3437,
+      "step": 6540
+    },
+    {
+      "epoch": 2.7879303017424566,
+      "grad_norm": 1.7145280838012695,
+      "learning_rate": 1.4222977758889362e-05,
+      "loss": 1.3514,
+      "step": 6560
+    },
+    {
+      "epoch": 2.7964300892477687,
+      "grad_norm": 1.6447581052780151,
+      "learning_rate": 1.3656325258535204e-05,
+      "loss": 1.3583,
+      "step": 6580
+    },
+    {
+      "epoch": 2.8049298767530813,
+      "grad_norm": 2.2724671363830566,
+      "learning_rate": 1.3089672758181046e-05,
+      "loss": 1.4376,
+      "step": 6600
+    },
+    {
+      "epoch": 2.8134296642583934,
+      "grad_norm": 1.8688862323760986,
+      "learning_rate": 1.252302025782689e-05,
+      "loss": 1.3107,
+      "step": 6620
+    },
+    {
+      "epoch": 2.821929451763706,
+      "grad_norm": 1.6515421867370605,
+      "learning_rate": 1.1956367757472731e-05,
+      "loss": 1.314,
+      "step": 6640
+    },
+    {
+      "epoch": 2.8304292392690185,
+      "grad_norm": 1.6967641115188599,
+      "learning_rate": 1.1389715257118574e-05,
+      "loss": 1.3865,
+      "step": 6660
+    },
+    {
+      "epoch": 2.8389290267743306,
+      "grad_norm": 2.137071371078491,
+      "learning_rate": 1.0823062756764415e-05,
+      "loss": 1.3028,
+      "step": 6680
+    },
+    {
+      "epoch": 2.847428814279643,
+      "grad_norm": 2.1401865482330322,
+      "learning_rate": 1.0256410256410256e-05,
+      "loss": 1.3872,
+      "step": 6700
+    },
+    {
+      "epoch": 2.8559286017849552,
+      "grad_norm": 1.701802134513855,
+      "learning_rate": 9.689757756056099e-06,
+      "loss": 1.3266,
+      "step": 6720
+    },
+    {
+      "epoch": 2.8644283892902678,
+      "grad_norm": 1.8868361711502075,
+      "learning_rate": 9.123105255701941e-06,
+      "loss": 1.3023,
+      "step": 6740
+    },
+    {
+      "epoch": 2.87292817679558,
+      "grad_norm": 1.4570691585540771,
+      "learning_rate": 8.556452755347784e-06,
+      "loss": 1.2953,
+      "step": 6760
+    },
+    {
+      "epoch": 2.8814279643008924,
+      "grad_norm": 1.551537275314331,
+      "learning_rate": 7.989800254993625e-06,
+      "loss": 1.3777,
+      "step": 6780
+    },
+    {
+      "epoch": 2.889927751806205,
+      "grad_norm": 1.9063984155654907,
+      "learning_rate": 7.423147754639467e-06,
+      "loss": 1.3874,
+      "step": 6800
+    },
+    {
+      "epoch": 2.898427539311517,
+      "grad_norm": 1.859644889831543,
+      "learning_rate": 6.85649525428531e-06,
+      "loss": 1.3712,
+      "step": 6820
+    },
+    {
+      "epoch": 2.9069273268168296,
+      "grad_norm": 1.7726514339447021,
+      "learning_rate": 6.2898427539311525e-06,
+      "loss": 1.3823,
+      "step": 6840
+    },
+    {
+      "epoch": 2.915427114322142,
+      "grad_norm": 1.552988886833191,
+      "learning_rate": 5.7231902535769936e-06,
+      "loss": 1.3455,
+      "step": 6860
+    },
+    {
+      "epoch": 2.9239269018274543,
+      "grad_norm": 1.5305794477462769,
+      "learning_rate": 5.156537753222836e-06,
+      "loss": 1.3412,
+      "step": 6880
+    },
+    {
+      "epoch": 2.932426689332767,
+      "grad_norm": 1.8889455795288086,
+      "learning_rate": 4.589885252868679e-06,
+      "loss": 1.3576,
+      "step": 6900
+    },
+    {
+      "epoch": 2.940926476838079,
+      "grad_norm": 1.7747056484222412,
+      "learning_rate": 4.023232752514521e-06,
+      "loss": 1.346,
+      "step": 6920
+    },
+    {
+      "epoch": 2.9494262643433915,
+      "grad_norm": 1.5252270698547363,
+      "learning_rate": 3.456580252160363e-06,
+      "loss": 1.3317,
+      "step": 6940
+    },
+    {
+      "epoch": 2.9579260518487036,
+      "grad_norm": 1.7793165445327759,
+      "learning_rate": 2.889927751806205e-06,
+      "loss": 1.3725,
+      "step": 6960
+    },
+    {
+      "epoch": 2.966425839354016,
+      "grad_norm": 1.6094822883605957,
+      "learning_rate": 2.323275251452047e-06,
+      "loss": 1.3647,
+      "step": 6980
+    },
+    {
+      "epoch": 2.9749256268593287,
+      "grad_norm": 1.7421776056289673,
+      "learning_rate": 1.7566227510978892e-06,
+      "loss": 1.4039,
+      "step": 7000
+    },
+    {
+      "epoch": 2.983425414364641,
+      "grad_norm": 1.8629122972488403,
+      "learning_rate": 1.1899702507437315e-06,
+      "loss": 1.3356,
+      "step": 7020
+    },
+    {
+      "epoch": 2.9919252018699534,
+      "grad_norm": 1.6828492879867554,
+      "learning_rate": 6.233177503895737e-07,
+      "loss": 1.3754,
+      "step": 7040
+    }
+  ],
+  "logging_steps": 20,
+  "max_steps": 7059,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 8.992048570328678e+16,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

tinyllama-lora-finetuned/checkpoint-7059/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:15e87998291b092367622e9a01c7bf9c9073fbba3c3325c704524d294a28c0e8
+size 5304