Training in progress, step 100, checkpoint

Browse files

Files changed (14) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +33 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +4 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +205 -0
last-checkpoint/trainer_state.json +766 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: katuni4ka/tiny-random-dbrx
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "katuni4ka/tiny-random-dbrx",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "out_proj",
+    "Wqkv",
+    "v_proj",
+    "layer",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:32de89f64bfbdf5257adb9297417907f059af3bb9a4f224da336aef2a6b7bfa8
+size 9864

last-checkpoint/added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<|im_end|>": 100279,
+  "<|im_start|>": 100278
+}

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:206339865b007a0db7499757f0cc0552961328fa94fd512a48b5732e4bb1ebcb
+size 24006

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6bca983b309063a996168bc9ba0246dee10aad731d5eafae85ac843af75455c4
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e9a495185b30e410553401cbf647ae58e45b1f7a5b4cfd1421665ad738e6aa1
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|pad|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,205 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "100256": {
+      "content": "<||_unused_0_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100257": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100258": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100259": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100260": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100261": {
+      "content": "<||_unused_1_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100262": {
+      "content": "<||_unused_2_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100263": {
+      "content": "<||_unused_3_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100264": {
+      "content": "<||_unused_4_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100265": {
+      "content": "<||_unused_5_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100266": {
+      "content": "<||_unused_6_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100267": {
+      "content": "<||_unused_7_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100268": {
+      "content": "<||_unused_8_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100269": {
+      "content": "<||_unused_9_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100270": {
+      "content": "<||_unused_10_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100271": {
+      "content": "<||_unused_11_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100272": {
+      "content": "<||_unused_12_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100273": {
+      "content": "<||_unused_13_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100274": {
+      "content": "<||_unused_14_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100275": {
+      "content": "<||_unused_15_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100276": {
+      "content": "<|endofprompt|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100277": {
+      "content": "<|pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100278": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100279": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 32768,
+  "pad_token": "<|pad|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,766 @@

+{
+  "best_metric": 11.5,
+  "best_model_checkpoint": "miner_id_24/checkpoint-100",
+  "epoch": 0.20717337822089862,
+  "eval_steps": 50,
+  "global_step": 100,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.002071733782208986,
+      "grad_norm": 0.0005146843031980097,
+      "learning_rate": 2e-05,
+      "loss": 184.0,
+      "step": 1
+    },
+    {
+      "epoch": 0.002071733782208986,
+      "eval_loss": 11.5,
+      "eval_runtime": 6.8543,
+      "eval_samples_per_second": 237.223,
+      "eval_steps_per_second": 59.379,
+      "step": 1
+    },
+    {
+      "epoch": 0.004143467564417972,
+      "grad_norm": 0.000494365522172302,
+      "learning_rate": 4e-05,
+      "loss": 184.0,
+      "step": 2
+    },
+    {
+      "epoch": 0.006215201346626958,
+      "grad_norm": 0.000508551427628845,
+      "learning_rate": 6e-05,
+      "loss": 184.0,
+      "step": 3
+    },
+    {
+      "epoch": 0.008286935128835944,
+      "grad_norm": 0.0004840320907533169,
+      "learning_rate": 8e-05,
+      "loss": 184.0,
+      "step": 4
+    },
+    {
+      "epoch": 0.01035866891104493,
+      "grad_norm": 0.0005024486454203725,
+      "learning_rate": 0.0001,
+      "loss": 184.0,
+      "step": 5
+    },
+    {
+      "epoch": 0.012430402693253916,
+      "grad_norm": 0.0004571828176267445,
+      "learning_rate": 0.00012,
+      "loss": 184.0,
+      "step": 6
+    },
+    {
+      "epoch": 0.014502136475462902,
+      "grad_norm": 0.0005345512763597071,
+      "learning_rate": 0.00014,
+      "loss": 184.0,
+      "step": 7
+    },
+    {
+      "epoch": 0.016573870257671888,
+      "grad_norm": 0.0005252218688838184,
+      "learning_rate": 0.00016,
+      "loss": 184.0,
+      "step": 8
+    },
+    {
+      "epoch": 0.018645604039880876,
+      "grad_norm": 0.0006107671069912612,
+      "learning_rate": 0.00018,
+      "loss": 184.0,
+      "step": 9
+    },
+    {
+      "epoch": 0.02071733782208986,
+      "grad_norm": 0.0005827260320074856,
+      "learning_rate": 0.0002,
+      "loss": 184.0,
+      "step": 10
+    },
+    {
+      "epoch": 0.022789071604298848,
+      "grad_norm": 0.0005423022666946054,
+      "learning_rate": 0.00019999779430290247,
+      "loss": 184.0,
+      "step": 11
+    },
+    {
+      "epoch": 0.024860805386507832,
+      "grad_norm": 0.0006856067921034992,
+      "learning_rate": 0.0001999911773089118,
+      "loss": 184.0,
+      "step": 12
+    },
+    {
+      "epoch": 0.02693253916871682,
+      "grad_norm": 0.0006850535282865167,
+      "learning_rate": 0.00019998014930992976,
+      "loss": 184.0,
+      "step": 13
+    },
+    {
+      "epoch": 0.029004272950925804,
+      "grad_norm": 0.0006938354345038533,
+      "learning_rate": 0.00019996471079244477,
+      "loss": 184.0,
+      "step": 14
+    },
+    {
+      "epoch": 0.031076006733134792,
+      "grad_norm": 0.0006841796566732228,
+      "learning_rate": 0.00019994486243751075,
+      "loss": 184.0,
+      "step": 15
+    },
+    {
+      "epoch": 0.033147740515343777,
+      "grad_norm": 0.0006545998039655387,
+      "learning_rate": 0.00019992060512071688,
+      "loss": 184.0,
+      "step": 16
+    },
+    {
+      "epoch": 0.03521947429755277,
+      "grad_norm": 0.0007753711543045938,
+      "learning_rate": 0.000199891939912149,
+      "loss": 184.0,
+      "step": 17
+    },
+    {
+      "epoch": 0.03729120807976175,
+      "grad_norm": 0.0008910410688258708,
+      "learning_rate": 0.00019985886807634247,
+      "loss": 184.0,
+      "step": 18
+    },
+    {
+      "epoch": 0.039362941861970736,
+      "grad_norm": 0.0008440872770734131,
+      "learning_rate": 0.00019982139107222632,
+      "loss": 184.0,
+      "step": 19
+    },
+    {
+      "epoch": 0.04143467564417972,
+      "grad_norm": 0.0008364002569578588,
+      "learning_rate": 0.00019977951055305898,
+      "loss": 184.0,
+      "step": 20
+    },
+    {
+      "epoch": 0.04350640942638871,
+      "grad_norm": 0.0009039640426635742,
+      "learning_rate": 0.00019973322836635518,
+      "loss": 184.0,
+      "step": 21
+    },
+    {
+      "epoch": 0.045578143208597696,
+      "grad_norm": 0.0010899179615080357,
+      "learning_rate": 0.00019968254655380465,
+      "loss": 184.0,
+      "step": 22
+    },
+    {
+      "epoch": 0.04764987699080668,
+      "grad_norm": 0.0010056191822513938,
+      "learning_rate": 0.00019962746735118192,
+      "loss": 184.0,
+      "step": 23
+    },
+    {
+      "epoch": 0.049721610773015665,
+      "grad_norm": 0.0010592422913759947,
+      "learning_rate": 0.00019956799318824776,
+      "loss": 184.0,
+      "step": 24
+    },
+    {
+      "epoch": 0.051793344555224656,
+      "grad_norm": 0.0010585510171949863,
+      "learning_rate": 0.00019950412668864187,
+      "loss": 184.0,
+      "step": 25
+    },
+    {
+      "epoch": 0.05386507833743364,
+      "grad_norm": 0.0012123475316911936,
+      "learning_rate": 0.00019943587066976738,
+      "loss": 184.0,
+      "step": 26
+    },
+    {
+      "epoch": 0.055936812119642625,
+      "grad_norm": 0.001220652018673718,
+      "learning_rate": 0.00019936322814266633,
+      "loss": 184.0,
+      "step": 27
+    },
+    {
+      "epoch": 0.05800854590185161,
+      "grad_norm": 0.0012845518067479134,
+      "learning_rate": 0.00019928620231188693,
+      "loss": 184.0,
+      "step": 28
+    },
+    {
+      "epoch": 0.0600802796840606,
+      "grad_norm": 0.0013785504270344973,
+      "learning_rate": 0.0001992047965753422,
+      "loss": 184.0,
+      "step": 29
+    },
+    {
+      "epoch": 0.062152013466269584,
+      "grad_norm": 0.0013902273494750261,
+      "learning_rate": 0.0001991190145241601,
+      "loss": 184.0,
+      "step": 30
+    },
+    {
+      "epoch": 0.06422374724847857,
+      "grad_norm": 0.0015755494823679328,
+      "learning_rate": 0.00019902885994252506,
+      "loss": 184.0,
+      "step": 31
+    },
+    {
+      "epoch": 0.06629548103068755,
+      "grad_norm": 0.0015162356430664659,
+      "learning_rate": 0.00019893433680751103,
+      "loss": 184.0,
+      "step": 32
+    },
+    {
+      "epoch": 0.06836721481289654,
+      "grad_norm": 0.0014853848842903972,
+      "learning_rate": 0.00019883544928890612,
+      "loss": 184.0,
+      "step": 33
+    },
+    {
+      "epoch": 0.07043894859510554,
+      "grad_norm": 0.001761424238793552,
+      "learning_rate": 0.00019873220174902858,
+      "loss": 184.0,
+      "step": 34
+    },
+    {
+      "epoch": 0.07251068237731452,
+      "grad_norm": 0.001529531553387642,
+      "learning_rate": 0.0001986245987425344,
+      "loss": 184.0,
+      "step": 35
+    },
+    {
+      "epoch": 0.0745824161595235,
+      "grad_norm": 0.0018305148696526885,
+      "learning_rate": 0.00019851264501621633,
+      "loss": 184.0,
+      "step": 36
+    },
+    {
+      "epoch": 0.07665414994173249,
+      "grad_norm": 0.0018470885697752237,
+      "learning_rate": 0.0001983963455087946,
+      "loss": 184.0,
+      "step": 37
+    },
+    {
+      "epoch": 0.07872588372394147,
+      "grad_norm": 0.00206680316478014,
+      "learning_rate": 0.0001982757053506989,
+      "loss": 184.0,
+      "step": 38
+    },
+    {
+      "epoch": 0.08079761750615046,
+      "grad_norm": 0.001970326993614435,
+      "learning_rate": 0.00019815072986384218,
+      "loss": 184.0,
+      "step": 39
+    },
+    {
+      "epoch": 0.08286935128835944,
+      "grad_norm": 0.0019653683993965387,
+      "learning_rate": 0.0001980214245613858,
+      "loss": 184.0,
+      "step": 40
+    },
+    {
+      "epoch": 0.08494108507056843,
+      "grad_norm": 0.002079579746350646,
+      "learning_rate": 0.00019788779514749635,
+      "loss": 184.0,
+      "step": 41
+    },
+    {
+      "epoch": 0.08701281885277742,
+      "grad_norm": 0.0023222542367875576,
+      "learning_rate": 0.0001977498475170941,
+      "loss": 184.0,
+      "step": 42
+    },
+    {
+      "epoch": 0.08908455263498641,
+      "grad_norm": 0.0023295129649341106,
+      "learning_rate": 0.00019760758775559274,
+      "loss": 184.0,
+      "step": 43
+    },
+    {
+      "epoch": 0.09115628641719539,
+      "grad_norm": 0.0024517809506505728,
+      "learning_rate": 0.00019746102213863114,
+      "loss": 184.0,
+      "step": 44
+    },
+    {
+      "epoch": 0.09322802019940438,
+      "grad_norm": 0.0022949776612222195,
+      "learning_rate": 0.00019731015713179645,
+      "loss": 184.0,
+      "step": 45
+    },
+    {
+      "epoch": 0.09529975398161336,
+      "grad_norm": 0.002672865055501461,
+      "learning_rate": 0.00019715499939033883,
+      "loss": 184.0,
+      "step": 46
+    },
+    {
+      "epoch": 0.09737148776382235,
+      "grad_norm": 0.002469174098223448,
+      "learning_rate": 0.0001969955557588778,
+      "loss": 184.0,
+      "step": 47
+    },
+    {
+      "epoch": 0.09944322154603133,
+      "grad_norm": 0.0028487169183790684,
+      "learning_rate": 0.00019683183327110057,
+      "loss": 184.0,
+      "step": 48
+    },
+    {
+      "epoch": 0.10151495532824033,
+      "grad_norm": 0.0027139163576066494,
+      "learning_rate": 0.0001966638391494514,
+      "loss": 184.0,
+      "step": 49
+    },
+    {
+      "epoch": 0.10358668911044931,
+      "grad_norm": 0.00295127066783607,
+      "learning_rate": 0.00019649158080481323,
+      "loss": 184.0,
+      "step": 50
+    },
+    {
+      "epoch": 0.10358668911044931,
+      "eval_loss": 11.5,
+      "eval_runtime": 6.9485,
+      "eval_samples_per_second": 234.008,
+      "eval_steps_per_second": 58.574,
+      "step": 50
+    },
+    {
+      "epoch": 0.1056584228926583,
+      "grad_norm": 0.0030891685746610165,
+      "learning_rate": 0.0001963150658361807,
+      "loss": 184.0,
+      "step": 51
+    },
+    {
+      "epoch": 0.10773015667486728,
+      "grad_norm": 0.0029488264117389917,
+      "learning_rate": 0.00019613430203032487,
+      "loss": 184.0,
+      "step": 52
+    },
+    {
+      "epoch": 0.10980189045707626,
+      "grad_norm": 0.003068000078201294,
+      "learning_rate": 0.00019594929736144976,
+      "loss": 184.0,
+      "step": 53
+    },
+    {
+      "epoch": 0.11187362423928525,
+      "grad_norm": 0.0036415450740605593,
+      "learning_rate": 0.0001957600599908406,
+      "loss": 184.0,
+      "step": 54
+    },
+    {
+      "epoch": 0.11394535802149423,
+      "grad_norm": 0.0035608517937362194,
+      "learning_rate": 0.00019556659826650382,
+      "loss": 184.0,
+      "step": 55
+    },
+    {
+      "epoch": 0.11601709180370322,
+      "grad_norm": 0.0038965872954577208,
+      "learning_rate": 0.0001953689207227986,
+      "loss": 184.0,
+      "step": 56
+    },
+    {
+      "epoch": 0.11808882558591222,
+      "grad_norm": 0.003413013881072402,
+      "learning_rate": 0.00019516703608006076,
+      "loss": 184.0,
+      "step": 57
+    },
+    {
+      "epoch": 0.1201605593681212,
+      "grad_norm": 0.0033162119798362255,
+      "learning_rate": 0.0001949609532442176,
+      "loss": 184.0,
+      "step": 58
+    },
+    {
+      "epoch": 0.12223229315033018,
+      "grad_norm": 0.0038920752704143524,
+      "learning_rate": 0.00019475068130639543,
+      "loss": 184.0,
+      "step": 59
+    },
+    {
+      "epoch": 0.12430402693253917,
+      "grad_norm": 0.003772893687710166,
+      "learning_rate": 0.00019453622954251828,
+      "loss": 184.0,
+      "step": 60
+    },
+    {
+      "epoch": 0.12637576071474815,
+      "grad_norm": 0.004632446449249983,
+      "learning_rate": 0.00019431760741289887,
+      "loss": 184.0,
+      "step": 61
+    },
+    {
+      "epoch": 0.12844749449695714,
+      "grad_norm": 0.0038107121363282204,
+      "learning_rate": 0.00019409482456182105,
+      "loss": 184.0,
+      "step": 62
+    },
+    {
+      "epoch": 0.13051922827916612,
+      "grad_norm": 0.0040929620154201984,
+      "learning_rate": 0.00019386789081711462,
+      "loss": 184.0,
+      "step": 63
+    },
+    {
+      "epoch": 0.1325909620613751,
+      "grad_norm": 0.004407749976962805,
+      "learning_rate": 0.00019363681618972164,
+      "loss": 184.0,
+      "step": 64
+    },
+    {
+      "epoch": 0.1346626958435841,
+      "grad_norm": 0.00396118825301528,
+      "learning_rate": 0.0001934016108732548,
+      "loss": 184.0,
+      "step": 65
+    },
+    {
+      "epoch": 0.13673442962579307,
+      "grad_norm": 0.004257251974195242,
+      "learning_rate": 0.00019316228524354778,
+      "loss": 184.0,
+      "step": 66
+    },
+    {
+      "epoch": 0.13880616340800206,
+      "grad_norm": 0.004530716687440872,
+      "learning_rate": 0.00019291884985819747,
+      "loss": 184.0,
+      "step": 67
+    },
+    {
+      "epoch": 0.14087789719021107,
+      "grad_norm": 0.00488580297678709,
+      "learning_rate": 0.0001926713154560984,
+      "loss": 184.0,
+      "step": 68
+    },
+    {
+      "epoch": 0.14294963097242006,
+      "grad_norm": 0.0052573285065591335,
+      "learning_rate": 0.00019241969295696879,
+      "loss": 184.0,
+      "step": 69
+    },
+    {
+      "epoch": 0.14502136475462904,
+      "grad_norm": 0.0049368697218596935,
+      "learning_rate": 0.00019216399346086893,
+      "loss": 184.0,
+      "step": 70
+    },
+    {
+      "epoch": 0.14709309853683802,
+      "grad_norm": 0.004978106822818518,
+      "learning_rate": 0.00019190422824771157,
+      "loss": 184.0,
+      "step": 71
+    },
+    {
+      "epoch": 0.149164832319047,
+      "grad_norm": 0.005578070413321257,
+      "learning_rate": 0.00019164040877676423,
+      "loss": 184.0,
+      "step": 72
+    },
+    {
+      "epoch": 0.151236566101256,
+      "grad_norm": 0.005424323491752148,
+      "learning_rate": 0.00019137254668614377,
+      "loss": 184.0,
+      "step": 73
+    },
+    {
+      "epoch": 0.15330829988346498,
+      "grad_norm": 0.004915840458124876,
+      "learning_rate": 0.00019110065379230289,
+      "loss": 184.0,
+      "step": 74
+    },
+    {
+      "epoch": 0.15538003366567396,
+      "grad_norm": 0.0062981476075947285,
+      "learning_rate": 0.0001908247420895089,
+      "loss": 184.0,
+      "step": 75
+    },
+    {
+      "epoch": 0.15745176744788295,
+      "grad_norm": 0.005838929675519466,
+      "learning_rate": 0.00019054482374931467,
+      "loss": 184.0,
+      "step": 76
+    },
+    {
+      "epoch": 0.15952350123009193,
+      "grad_norm": 0.0066667902283370495,
+      "learning_rate": 0.00019026091112002162,
+      "loss": 184.0,
+      "step": 77
+    },
+    {
+      "epoch": 0.16159523501230091,
+      "grad_norm": 0.005362570285797119,
+      "learning_rate": 0.00018997301672613495,
+      "loss": 184.0,
+      "step": 78
+    },
+    {
+      "epoch": 0.1636669687945099,
+      "grad_norm": 0.006221631541848183,
+      "learning_rate": 0.0001896811532678113,
+      "loss": 184.0,
+      "step": 79
+    },
+    {
+      "epoch": 0.16573870257671888,
+      "grad_norm": 0.00644316291436553,
+      "learning_rate": 0.0001893853336202983,
+      "loss": 184.0,
+      "step": 80
+    },
+    {
+      "epoch": 0.16781043635892787,
+      "grad_norm": 0.005687521304935217,
+      "learning_rate": 0.00018908557083336666,
+      "loss": 184.0,
+      "step": 81
+    },
+    {
+      "epoch": 0.16988217014113685,
+      "grad_norm": 0.007695197593420744,
+      "learning_rate": 0.00018878187813073464,
+      "loss": 184.0,
+      "step": 82
+    },
+    {
+      "epoch": 0.17195390392334586,
+      "grad_norm": 0.006890579126775265,
+      "learning_rate": 0.00018847426890948447,
+      "loss": 184.0,
+      "step": 83
+    },
+    {
+      "epoch": 0.17402563770555485,
+      "grad_norm": 0.006877605803310871,
+      "learning_rate": 0.00018816275673947148,
+      "loss": 184.0,
+      "step": 84
+    },
+    {
+      "epoch": 0.17609737148776383,
+      "grad_norm": 0.00736248167231679,
+      "learning_rate": 0.00018784735536272543,
+      "loss": 184.0,
+      "step": 85
+    },
+    {
+      "epoch": 0.17816910526997282,
+      "grad_norm": 0.007951020263135433,
+      "learning_rate": 0.00018752807869284438,
+      "loss": 184.0,
+      "step": 86
+    },
+    {
+      "epoch": 0.1802408390521818,
+      "grad_norm": 0.007793943397700787,
+      "learning_rate": 0.00018720494081438078,
+      "loss": 184.0,
+      "step": 87
+    },
+    {
+      "epoch": 0.18231257283439078,
+      "grad_norm": 0.006927513983100653,
+      "learning_rate": 0.00018687795598222023,
+      "loss": 184.0,
+      "step": 88
+    },
+    {
+      "epoch": 0.18438430661659977,
+      "grad_norm": 0.008044817484915257,
+      "learning_rate": 0.0001865471386209527,
+      "loss": 184.0,
+      "step": 89
+    },
+    {
+      "epoch": 0.18645604039880875,
+      "grad_norm": 0.00868427287787199,
+      "learning_rate": 0.00018621250332423602,
+      "loss": 184.0,
+      "step": 90
+    },
+    {
+      "epoch": 0.18852777418101774,
+      "grad_norm": 0.007847864180803299,
+      "learning_rate": 0.00018587406485415226,
+      "loss": 184.0,
+      "step": 91
+    },
+    {
+      "epoch": 0.19059950796322672,
+      "grad_norm": 0.008084665983915329,
+      "learning_rate": 0.00018553183814055643,
+      "loss": 184.0,
+      "step": 92
+    },
+    {
+      "epoch": 0.1926712417454357,
+      "grad_norm": 0.00783941987901926,
+      "learning_rate": 0.00018518583828041786,
+      "loss": 184.0,
+      "step": 93
+    },
+    {
+      "epoch": 0.1947429755276447,
+      "grad_norm": 0.009109465405344963,
+      "learning_rate": 0.0001848360805371544,
+      "loss": 184.0,
+      "step": 94
+    },
+    {
+      "epoch": 0.19681470930985367,
+      "grad_norm": 0.00827021710574627,
+      "learning_rate": 0.00018448258033995876,
+      "loss": 184.0,
+      "step": 95
+    },
+    {
+      "epoch": 0.19888644309206266,
+      "grad_norm": 0.009131607599556446,
+      "learning_rate": 0.00018412535328311814,
+      "loss": 184.0,
+      "step": 96
+    },
+    {
+      "epoch": 0.20095817687427164,
+      "grad_norm": 0.00926326122134924,
+      "learning_rate": 0.00018376441512532617,
+      "loss": 184.0,
+      "step": 97
+    },
+    {
+      "epoch": 0.20302991065648066,
+      "grad_norm": 0.010163519531488419,
+      "learning_rate": 0.0001833997817889878,
+      "loss": 184.0,
+      "step": 98
+    },
+    {
+      "epoch": 0.20510164443868964,
+      "grad_norm": 0.009918253868818283,
+      "learning_rate": 0.00018303146935951689,
+      "loss": 184.0,
+      "step": 99
+    },
+    {
+      "epoch": 0.20717337822089862,
+      "grad_norm": 0.008283627219498158,
+      "learning_rate": 0.00018265949408462654,
+      "loss": 184.0,
+      "step": 100
+    },
+    {
+      "epoch": 0.20717337822089862,
+      "eval_loss": 11.5,
+      "eval_runtime": 6.8232,
+      "eval_samples_per_second": 238.303,
+      "eval_steps_per_second": 59.649,
+      "step": 100
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 483,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 100,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 2,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 31916870860800.0,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cc104bd25c6cdbccbab751a745382c0cee38baf224c9db35a009a43ed0b9e6db
+size 6776

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff