johnlockejrr commited on
Commit
50b10aa
·
verified ·
1 Parent(s): 29537bd

Upload 15 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ source.spm filter=lfs diff=lfs merge=lfs -text
37
+ target.spm filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,183 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - arc
4
+ tags:
5
+ - diacritization
6
+ - aramaic
7
+ - vocalization
8
+ - targum
9
+ - semitic-languages
10
+ - sequence-to-sequence
11
+ license: mit
12
+ base_model: Helsinki-NLP/opus-mt-afa-afa
13
+ library_name: transformers
14
+ ---
15
+
16
+ # Aramaic Diacritization Model (MarianMT)
17
+
18
+ This model is a fine-tuned MarianMT model for Aramaic text diacritization (vocalization), converting consonantal Aramaic text to fully vocalized text with nikkud (vowel points).
19
+
20
+ ## Model Description
21
+
22
+ - **Model type:** MarianMT (Encoder-Decoder Transformer)
23
+ - **Language:** Aramaic (arc2arc)
24
+ - **Task:** Text diacritization/vocalization
25
+ - **Base model:** [Helsinki-NLP/opus-mt-afa-afa](https://huggingface.co/Helsinki-NLP/opus-mt-afa-afa)
26
+ - **Parameters:** 61,924,352 (61.9M)
27
+
28
+ ## Model Architecture
29
+
30
+ - **Architecture:** MarianMT (Marian Machine Translation)
31
+ - **Encoder layers:** 6
32
+ - **Decoder layers:** 6
33
+ - **Hidden size:** 512
34
+ - **Attention heads:** 8
35
+ - **Feed-forward dimension:** 2048
36
+ - **Vocabulary size:** 33,714
37
+ - **Max sequence length:** 512 tokens
38
+ - **Activation function:** Swish
39
+ - **Position embeddings:** Static
40
+
41
+ ## Training Details
42
+
43
+ ### Training Configuration
44
+ - **Training data:** 12,110 examples
45
+ - **Validation data:** 1,514 examples
46
+ - **Batch size:** 8
47
+ - **Gradient accumulation steps:** 2
48
+ - **Effective batch size:** 16
49
+ - **Learning rate:** 1e-5
50
+ - **Warmup steps:** 1,000
51
+ - **Max epochs:** 100
52
+ - **Training completed at:** Epoch 36.33
53
+ - **Mixed precision:** FP16 enabled
54
+
55
+ ### Training Metrics
56
+ - **Final training loss:** 0.283
57
+ - **Training runtime:** 21,727 seconds (~6 hours)
58
+ - **Training samples per second:** 55.7
59
+ - **Training steps per second:** 3.48
60
+
61
+ ## Evaluation Results
62
+
63
+ ### Test Set Performance
64
+ - **BLEU Score:** 72.90
65
+ - **Character Accuracy:** 63.78%
66
+ - **Evaluation Loss:** 0.088
67
+ - **Evaluation Runtime:** 311.5 seconds
68
+ - **Evaluation samples per second:** 4.86
69
+
70
+ ## Usage
71
+
72
+ ### Basic Usage
73
+
74
+ ```python
75
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
76
+
77
+ # Load model and tokenizer
78
+ model_name = "johnlockejrr/aramaic-diacritization-model"
79
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
80
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
81
+
82
+ # Example input (consonantal Aramaic text)
83
+ consonantal_text = "בקדמין ברא יי ית שמיא וית ארעא"
84
+
85
+ # Tokenize input
86
+ inputs = tokenizer(consonantal_text, return_tensors="pt", max_length=512, truncation=True)
87
+
88
+ # Generate vocalized text
89
+ outputs = model.generate(**inputs, max_length=512, num_beams=4, early_stopping=True)
90
+
91
+ # Decode output
92
+ vocalized_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
93
+ print(f"Input: {consonantal_text}")
94
+ print(f"Output: {vocalized_text}")
95
+ ```
96
+
97
+ ### Using the Pipeline
98
+
99
+ ```python
100
+ from transformers import pipeline
101
+
102
+ diacritizer = pipeline("text2text-generation", model="johnlockejrr/aramaic-diacritization-model")
103
+
104
+ # Process text
105
+ consonantal_text = "בראשית ברא אלהים את השמים ואת הארץ"
106
+ vocalized_text = diacritizer(consonantal_text)[0]['generated_text']
107
+ print(vocalized_text)
108
+ ```
109
+
110
+ ## Training Data
111
+
112
+ The model was trained on a custom Aramaic diacritization dataset with the following characteristics:
113
+
114
+ - **Source:** Consonantal Aramaic text (without vowel points)
115
+ - **Target:** Vocalized Aramaic text (with nikkud/vowel points)
116
+ - **Data format:** CSV with columns: consonantal, vocalized, book, chapter, verse
117
+ - **Data split:** 80% train, 10% validation, 10% test
118
+ - **Text cleaning:** Preserves nikkud in target text, removes punctuation from source
119
+
120
+ ### Data Preprocessing
121
+ - **Input cleaning:** Removes punctuation and formatting while preserving letters
122
+ - **Target preservation:** Maintains all nikkud (vowel points) and diacritical marks
123
+ - **Length filtering:** Removes sequences shorter than 2 characters or longer than 1000 characters
124
+ - **Duplicate handling:** Removes exact duplicates to prevent data leakage
125
+
126
+ ## Limitations and Bias
127
+
128
+ - **Domain specificity:** Trained primarily on religious/biblical Aramaic texts
129
+ - **Vocabulary coverage:** Limited to the vocabulary present in the training corpus
130
+ - **Length constraints:** Maximum input/output length of 512 tokens
131
+ - **Style consistency:** May not handle modern Aramaic dialects or contemporary usage
132
+ - **Performance:** Character accuracy of ~64% indicates room for improvement
133
+
134
+ ## Environmental Impact
135
+
136
+ - **Hardware used:** NVIDIA GPU (GTX 3060 12GB)
137
+ - **Training time:** ~6 hours
138
+ - **Carbon emissions:** Estimated low (single GPU, moderate training time)
139
+ - **Energy efficiency:** FP16 mixed precision used to reduce memory usage
140
+
141
+ ## Citation
142
+
143
+ If you use this model in your research, please cite:
144
+
145
+ ```bibtex
146
+ @misc{aramaic-diacritization-2024,
147
+ title={Aramaic Diacritization Model},
148
+ author={Your Name},
149
+ year={2024},
150
+ howpublished={Hugging Face Model Hub},
151
+ url={https://huggingface.co/johnlockejrr/aramaic-diacritization-model}
152
+ }
153
+ ```
154
+
155
+ ## License
156
+
157
+ [Specify your license here]
158
+
159
+ ## Acknowledgments
160
+
161
+ - Base model: [Helsinki-NLP/opus-mt-afa-afa](https://huggingface.co/Helsinki-NLP/opus-mt-afa-afa)
162
+ - Training framework: Hugging Face Transformers
163
+ - Dataset: Custom Aramaic diacritization corpus
164
+
165
+ ## Model Files
166
+
167
+ - `model.safetensors` - Model weights (234MB)
168
+ - `config.json` - Model configuration
169
+ - `tokenizer_config.json` - Tokenizer configuration
170
+ - `source.spm` / `target.spm` - SentencePiece models
171
+ - `vocab.json` - Vocabulary file
172
+ - `generation_config.json` - Generation parameters
173
+
174
+ ## Training Scripts
175
+
176
+ The model was trained using custom scripts:
177
+ - `train_arc2arc_improved_deep.py` - Main training script
178
+ - `run_arc2arc_improved_deep.sh` - Training execution script
179
+ - `run_resume_arc2arc_deep.sh` - Resume training script
180
+
181
+ ## Contact
182
+
183
+ For questions, issues, or contributions, please open an issue on the model repository.
all_results.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 36.32760898282695,
3
+ "eval_bleu": 72.89736867105582,
4
+ "eval_char_accuracy": 63.78173613339268,
5
+ "eval_loss": 0.08819781988859177,
6
+ "eval_runtime": 311.5285,
7
+ "eval_samples_per_second": 4.86,
8
+ "eval_steps_per_second": 0.61,
9
+ "total_flos": 7035660725649408.0,
10
+ "train_loss": 0.282735899699818,
11
+ "train_runtime": 21727.9767,
12
+ "train_samples_per_second": 55.735,
13
+ "train_steps_per_second": 3.484
14
+ }
config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_dropout": 0.0,
3
+ "activation_function": "swish",
4
+ "add_bias_logits": false,
5
+ "add_final_layer_norm": false,
6
+ "architectures": [
7
+ "MarianMTModel"
8
+ ],
9
+ "attention_dropout": 0.0,
10
+ "bos_token_id": 0,
11
+ "classif_dropout": 0.0,
12
+ "classifier_dropout": 0.0,
13
+ "d_model": 512,
14
+ "decoder_attention_heads": 8,
15
+ "decoder_ffn_dim": 2048,
16
+ "decoder_layerdrop": 0.0,
17
+ "decoder_layers": 6,
18
+ "decoder_start_token_id": 33713,
19
+ "decoder_vocab_size": 33714,
20
+ "dropout": 0.1,
21
+ "encoder_attention_heads": 8,
22
+ "encoder_ffn_dim": 2048,
23
+ "encoder_layerdrop": 0.0,
24
+ "encoder_layers": 6,
25
+ "eos_token_id": 0,
26
+ "extra_pos_embeddings": 33714,
27
+ "forced_eos_token_id": 0,
28
+ "id2label": {
29
+ "0": "LABEL_0",
30
+ "1": "LABEL_1",
31
+ "2": "LABEL_2"
32
+ },
33
+ "init_std": 0.02,
34
+ "is_encoder_decoder": true,
35
+ "label2id": {
36
+ "LABEL_0": 0,
37
+ "LABEL_1": 1,
38
+ "LABEL_2": 2
39
+ },
40
+ "max_length": null,
41
+ "max_position_embeddings": 512,
42
+ "model_type": "marian",
43
+ "normalize_before": false,
44
+ "normalize_embedding": false,
45
+ "num_beams": null,
46
+ "num_hidden_layers": 6,
47
+ "pad_token_id": 33713,
48
+ "scale_embedding": true,
49
+ "share_encoder_decoder_embeddings": true,
50
+ "static_position_embeddings": true,
51
+ "torch_dtype": "float32",
52
+ "transformers_version": "4.52.4",
53
+ "use_cache": true,
54
+ "vocab_size": 33714
55
+ }
generation_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bad_words_ids": [
3
+ [
4
+ 33713
5
+ ]
6
+ ],
7
+ "bos_token_id": 0,
8
+ "decoder_start_token_id": 33713,
9
+ "eos_token_id": 0,
10
+ "forced_eos_token_id": 0,
11
+ "max_length": 512,
12
+ "num_beams": 4,
13
+ "pad_token_id": 33713,
14
+ "transformers_version": "4.52.4"
15
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a50c1778afad506b902f0c0bef3f680042b96cd3fadd5c00f7652b97d6c7bca5
3
+ size 245764168
model_info.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "Helsinki-NLP/opus-mt-afa-afa",
3
+ "direction": "arc2arc",
4
+ "vocabulary_size": 33714,
5
+ "model_parameters": 61924352,
6
+ "training_config": {
7
+ "dataset_path": "aramaic_diacritization_dataset_deep",
8
+ "output_dir": "./aramaic_diacritization_model_deep",
9
+ "model_name": "Helsinki-NLP/opus-mt-afa-afa",
10
+ "batch_size": 8,
11
+ "learning_rate": 1e-05,
12
+ "num_epochs": 100,
13
+ "max_input_length": 512,
14
+ "max_target_length": 512,
15
+ "eval_steps": 500,
16
+ "save_steps": 500,
17
+ "warmup_steps": 1000,
18
+ "gradient_accumulation_steps": 2,
19
+ "use_fp16": true,
20
+ "use_wandb": false,
21
+ "skip_evaluation": false,
22
+ "seed": 42,
23
+ "resume_from_checkpoint": null,
24
+ "strict_resume": false
25
+ }
26
+ }
source.spm ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:191fc6c0ae01c8044b66544b34f00c3690d052af746cda8f07f4886f98d4df32
3
+ size 842154
special_tokens_map.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "eos_token": "</s>",
3
+ "pad_token": "<pad>",
4
+ "unk_token": "<unk>"
5
+ }
target.spm ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:db1c1a7f37a9362214e7f93173e3d8e770cc7a2cf0375e5f92fcd2ef19db2b65
3
+ size 842006
test_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 36.32760898282695,
3
+ "eval_bleu": 72.89736867105582,
4
+ "eval_char_accuracy": 63.78173613339268,
5
+ "eval_loss": 0.08819781988859177,
6
+ "eval_runtime": 311.5285,
7
+ "eval_samples_per_second": 4.86,
8
+ "eval_steps_per_second": 0.61
9
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "</s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<unk>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "33713": {
20
+ "content": "<pad>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ }
27
+ },
28
+ "clean_up_tokenization_spaces": false,
29
+ "eos_token": "</s>",
30
+ "extra_special_tokens": {},
31
+ "model_max_length": 512,
32
+ "pad_token": "<pad>",
33
+ "separate_vocabs": false,
34
+ "source_lang": "afa",
35
+ "sp_model_kwargs": {},
36
+ "target_lang": "afa",
37
+ "tokenizer_class": "MarianTokenizer",
38
+ "unk_token": "<unk>"
39
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 36.32760898282695,
3
+ "total_flos": 7035660725649408.0,
4
+ "train_loss": 0.282735899699818,
5
+ "train_runtime": 21727.9767,
6
+ "train_samples_per_second": 55.735,
7
+ "train_steps_per_second": 3.484
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2527 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 26000,
3
+ "best_metric": 74.62046528623556,
4
+ "best_model_checkpoint": "./aramaic_diacritization_model_deep/checkpoint-26000",
5
+ "epoch": 36.32760898282695,
6
+ "eval_steps": 500,
7
+ "global_step": 27500,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.13210039630118892,
14
+ "grad_norm": 49.535423278808594,
15
+ "learning_rate": 9.400000000000001e-07,
16
+ "loss": 13.2291,
17
+ "step": 100
18
+ },
19
+ {
20
+ "epoch": 0.26420079260237783,
21
+ "grad_norm": 8.345534324645996,
22
+ "learning_rate": 1.94e-06,
23
+ "loss": 4.9005,
24
+ "step": 200
25
+ },
26
+ {
27
+ "epoch": 0.3963011889035667,
28
+ "grad_norm": 6.039630889892578,
29
+ "learning_rate": 2.9400000000000002e-06,
30
+ "loss": 3.1719,
31
+ "step": 300
32
+ },
33
+ {
34
+ "epoch": 0.5284015852047557,
35
+ "grad_norm": 5.1691412925720215,
36
+ "learning_rate": 3.94e-06,
37
+ "loss": 2.6108,
38
+ "step": 400
39
+ },
40
+ {
41
+ "epoch": 0.6605019815059445,
42
+ "grad_norm": 4.929126262664795,
43
+ "learning_rate": 4.94e-06,
44
+ "loss": 2.2213,
45
+ "step": 500
46
+ },
47
+ {
48
+ "epoch": 0.6605019815059445,
49
+ "eval_bleu": 0.19172411762066338,
50
+ "eval_char_accuracy": 10.030946701760158,
51
+ "eval_loss": 1.71331787109375,
52
+ "eval_runtime": 308.7303,
53
+ "eval_samples_per_second": 4.904,
54
+ "eval_steps_per_second": 0.615,
55
+ "step": 500
56
+ },
57
+ {
58
+ "epoch": 0.7926023778071334,
59
+ "grad_norm": 5.598935604095459,
60
+ "learning_rate": 5.94e-06,
61
+ "loss": 1.9171,
62
+ "step": 600
63
+ },
64
+ {
65
+ "epoch": 0.9247027741083224,
66
+ "grad_norm": 5.103219032287598,
67
+ "learning_rate": 6.9400000000000005e-06,
68
+ "loss": 1.6624,
69
+ "step": 700
70
+ },
71
+ {
72
+ "epoch": 1.0568031704095113,
73
+ "grad_norm": 4.852541923522949,
74
+ "learning_rate": 7.94e-06,
75
+ "loss": 1.473,
76
+ "step": 800
77
+ },
78
+ {
79
+ "epoch": 1.1889035667107002,
80
+ "grad_norm": 4.4410624504089355,
81
+ "learning_rate": 8.94e-06,
82
+ "loss": 1.3081,
83
+ "step": 900
84
+ },
85
+ {
86
+ "epoch": 1.321003963011889,
87
+ "grad_norm": 3.9946470260620117,
88
+ "learning_rate": 9.940000000000001e-06,
89
+ "loss": 1.1709,
90
+ "step": 1000
91
+ },
92
+ {
93
+ "epoch": 1.321003963011889,
94
+ "eval_bleu": 5.2707215480174545,
95
+ "eval_char_accuracy": 18.391696825135714,
96
+ "eval_loss": 0.8503363132476807,
97
+ "eval_runtime": 326.5655,
98
+ "eval_samples_per_second": 4.636,
99
+ "eval_steps_per_second": 0.582,
100
+ "step": 1000
101
+ },
102
+ {
103
+ "epoch": 1.453104359313078,
104
+ "grad_norm": 5.828360557556152,
105
+ "learning_rate": 9.99996092907511e-06,
106
+ "loss": 1.0774,
107
+ "step": 1100
108
+ },
109
+ {
110
+ "epoch": 1.5852047556142668,
111
+ "grad_norm": 3.975123643875122,
112
+ "learning_rate": 9.999833582267183e-06,
113
+ "loss": 0.9803,
114
+ "step": 1200
115
+ },
116
+ {
117
+ "epoch": 1.7173051519154559,
118
+ "grad_norm": 4.110162258148193,
119
+ "learning_rate": 9.999617802644021e-06,
120
+ "loss": 0.9023,
121
+ "step": 1300
122
+ },
123
+ {
124
+ "epoch": 1.8494055482166445,
125
+ "grad_norm": 4.341949462890625,
126
+ "learning_rate": 9.999313594022158e-06,
127
+ "loss": 0.8494,
128
+ "step": 1400
129
+ },
130
+ {
131
+ "epoch": 1.9815059445178336,
132
+ "grad_norm": 3.7582991123199463,
133
+ "learning_rate": 9.99892096178217e-06,
134
+ "loss": 0.7841,
135
+ "step": 1500
136
+ },
137
+ {
138
+ "epoch": 1.9815059445178336,
139
+ "eval_bleu": 13.790865530076871,
140
+ "eval_char_accuracy": 25.990602895213026,
141
+ "eval_loss": 0.5515339374542236,
142
+ "eval_runtime": 310.3546,
143
+ "eval_samples_per_second": 4.878,
144
+ "eval_steps_per_second": 0.612,
145
+ "step": 1500
146
+ },
147
+ {
148
+ "epoch": 2.1136063408190227,
149
+ "grad_norm": 4.066399574279785,
150
+ "learning_rate": 9.998439912868608e-06,
151
+ "loss": 0.7379,
152
+ "step": 1600
153
+ },
154
+ {
155
+ "epoch": 2.2457067371202113,
156
+ "grad_norm": 3.739553928375244,
157
+ "learning_rate": 9.997870455789855e-06,
158
+ "loss": 0.6859,
159
+ "step": 1700
160
+ },
161
+ {
162
+ "epoch": 2.3778071334214004,
163
+ "grad_norm": 4.019917964935303,
164
+ "learning_rate": 9.997212600617986e-06,
165
+ "loss": 0.6547,
166
+ "step": 1800
167
+ },
168
+ {
169
+ "epoch": 2.509907529722589,
170
+ "grad_norm": 3.1273276805877686,
171
+ "learning_rate": 9.99646635898858e-06,
172
+ "loss": 0.6313,
173
+ "step": 1900
174
+ },
175
+ {
176
+ "epoch": 2.642007926023778,
177
+ "grad_norm": 3.0856070518493652,
178
+ "learning_rate": 9.995631744100536e-06,
179
+ "loss": 0.6058,
180
+ "step": 2000
181
+ },
182
+ {
183
+ "epoch": 2.642007926023778,
184
+ "eval_bleu": 22.141127198903924,
185
+ "eval_char_accuracy": 30.873190491857212,
186
+ "eval_loss": 0.4170660674571991,
187
+ "eval_runtime": 310.921,
188
+ "eval_samples_per_second": 4.869,
189
+ "eval_steps_per_second": 0.611,
190
+ "step": 2000
191
+ },
192
+ {
193
+ "epoch": 2.7741083223249667,
194
+ "grad_norm": 3.660649299621582,
195
+ "learning_rate": 9.994708770715807e-06,
196
+ "loss": 0.5758,
197
+ "step": 2100
198
+ },
199
+ {
200
+ "epoch": 2.906208718626156,
201
+ "grad_norm": 3.3534188270568848,
202
+ "learning_rate": 9.993697455159165e-06,
203
+ "loss": 0.5507,
204
+ "step": 2200
205
+ },
206
+ {
207
+ "epoch": 3.038309114927345,
208
+ "grad_norm": 2.831392526626587,
209
+ "learning_rate": 9.992597815317901e-06,
210
+ "loss": 0.5334,
211
+ "step": 2300
212
+ },
213
+ {
214
+ "epoch": 3.1704095112285335,
215
+ "grad_norm": 3.2069804668426514,
216
+ "learning_rate": 9.991409870641512e-06,
217
+ "loss": 0.508,
218
+ "step": 2400
219
+ },
220
+ {
221
+ "epoch": 3.3025099075297226,
222
+ "grad_norm": 3.3302793502807617,
223
+ "learning_rate": 9.990133642141359e-06,
224
+ "loss": 0.4816,
225
+ "step": 2500
226
+ },
227
+ {
228
+ "epoch": 3.3025099075297226,
229
+ "eval_bleu": 28.8217714543364,
230
+ "eval_char_accuracy": 34.30457312057904,
231
+ "eval_loss": 0.34086647629737854,
232
+ "eval_runtime": 312.1625,
233
+ "eval_samples_per_second": 4.85,
234
+ "eval_steps_per_second": 0.609,
235
+ "step": 2500
236
+ },
237
+ {
238
+ "epoch": 3.4346103038309117,
239
+ "grad_norm": 3.0527114868164062,
240
+ "learning_rate": 9.988769152390284e-06,
241
+ "loss": 0.4779,
242
+ "step": 2600
243
+ },
244
+ {
245
+ "epoch": 3.5667107001321003,
246
+ "grad_norm": 2.557722568511963,
247
+ "learning_rate": 9.987316425522226e-06,
248
+ "loss": 0.4626,
249
+ "step": 2700
250
+ },
251
+ {
252
+ "epoch": 3.6988110964332894,
253
+ "grad_norm": 2.993014097213745,
254
+ "learning_rate": 9.985775487231788e-06,
255
+ "loss": 0.4452,
256
+ "step": 2800
257
+ },
258
+ {
259
+ "epoch": 3.830911492734478,
260
+ "grad_norm": 2.7321043014526367,
261
+ "learning_rate": 9.984146364773777e-06,
262
+ "loss": 0.4408,
263
+ "step": 2900
264
+ },
265
+ {
266
+ "epoch": 3.963011889035667,
267
+ "grad_norm": 2.8790836334228516,
268
+ "learning_rate": 9.982429086962729e-06,
269
+ "loss": 0.4108,
270
+ "step": 3000
271
+ },
272
+ {
273
+ "epoch": 3.963011889035667,
274
+ "eval_bleu": 33.52667574407562,
275
+ "eval_char_accuracy": 37.42289027800625,
276
+ "eval_loss": 0.29226553440093994,
277
+ "eval_runtime": 316.4732,
278
+ "eval_samples_per_second": 4.784,
279
+ "eval_steps_per_second": 0.6,
280
+ "step": 3000
281
+ },
282
+ {
283
+ "epoch": 4.095112285336856,
284
+ "grad_norm": 2.8862805366516113,
285
+ "learning_rate": 9.980623684172396e-06,
286
+ "loss": 0.4134,
287
+ "step": 3100
288
+ },
289
+ {
290
+ "epoch": 4.227212681638045,
291
+ "grad_norm": 2.4621527194976807,
292
+ "learning_rate": 9.978730188335215e-06,
293
+ "loss": 0.3919,
294
+ "step": 3200
295
+ },
296
+ {
297
+ "epoch": 4.359313077939234,
298
+ "grad_norm": 2.822957992553711,
299
+ "learning_rate": 9.976748632941733e-06,
300
+ "loss": 0.384,
301
+ "step": 3300
302
+ },
303
+ {
304
+ "epoch": 4.491413474240423,
305
+ "grad_norm": 2.448110818862915,
306
+ "learning_rate": 9.974679053040018e-06,
307
+ "loss": 0.3735,
308
+ "step": 3400
309
+ },
310
+ {
311
+ "epoch": 4.623513870541611,
312
+ "grad_norm": 2.2914109230041504,
313
+ "learning_rate": 9.972521485235045e-06,
314
+ "loss": 0.3604,
315
+ "step": 3500
316
+ },
317
+ {
318
+ "epoch": 4.623513870541611,
319
+ "eval_bleu": 37.45556816331904,
320
+ "eval_char_accuracy": 40.02354416844876,
321
+ "eval_loss": 0.25823718309402466,
322
+ "eval_runtime": 320.7507,
323
+ "eval_samples_per_second": 4.72,
324
+ "eval_steps_per_second": 0.592,
325
+ "step": 3500
326
+ },
327
+ {
328
+ "epoch": 4.755614266842801,
329
+ "grad_norm": 2.5924477577209473,
330
+ "learning_rate": 9.970275967688047e-06,
331
+ "loss": 0.3624,
332
+ "step": 3600
333
+ },
334
+ {
335
+ "epoch": 4.887714663143989,
336
+ "grad_norm": 2.49125075340271,
337
+ "learning_rate": 9.967942540115829e-06,
338
+ "loss": 0.3508,
339
+ "step": 3700
340
+ },
341
+ {
342
+ "epoch": 5.019815059445178,
343
+ "grad_norm": 2.464569330215454,
344
+ "learning_rate": 9.965521243790079e-06,
345
+ "loss": 0.3355,
346
+ "step": 3800
347
+ },
348
+ {
349
+ "epoch": 5.1519154557463676,
350
+ "grad_norm": 2.6026785373687744,
351
+ "learning_rate": 9.963012121536635e-06,
352
+ "loss": 0.3284,
353
+ "step": 3900
354
+ },
355
+ {
356
+ "epoch": 5.284015852047556,
357
+ "grad_norm": 2.351313591003418,
358
+ "learning_rate": 9.96041521773472e-06,
359
+ "loss": 0.328,
360
+ "step": 4000
361
+ },
362
+ {
363
+ "epoch": 5.284015852047556,
364
+ "eval_bleu": 40.737027833366824,
365
+ "eval_char_accuracy": 42.49156933706202,
366
+ "eval_loss": 0.23189863562583923,
367
+ "eval_runtime": 320.7889,
368
+ "eval_samples_per_second": 4.72,
369
+ "eval_steps_per_second": 0.592,
370
+ "step": 4000
371
+ },
372
+ {
373
+ "epoch": 5.416116248348745,
374
+ "grad_norm": 3.0100066661834717,
375
+ "learning_rate": 9.95773057831617e-06,
376
+ "loss": 0.311,
377
+ "step": 4100
378
+ },
379
+ {
380
+ "epoch": 5.5482166446499335,
381
+ "grad_norm": 2.049722671508789,
382
+ "learning_rate": 9.954958250764604e-06,
383
+ "loss": 0.3136,
384
+ "step": 4200
385
+ },
386
+ {
387
+ "epoch": 5.680317040951123,
388
+ "grad_norm": 2.1132702827453613,
389
+ "learning_rate": 9.952098284114604e-06,
390
+ "loss": 0.3,
391
+ "step": 4300
392
+ },
393
+ {
394
+ "epoch": 5.812417437252312,
395
+ "grad_norm": 2.174574613571167,
396
+ "learning_rate": 9.949150728950833e-06,
397
+ "loss": 0.3093,
398
+ "step": 4400
399
+ },
400
+ {
401
+ "epoch": 5.9445178335535,
402
+ "grad_norm": 2.8350446224212646,
403
+ "learning_rate": 9.946115637407145e-06,
404
+ "loss": 0.2988,
405
+ "step": 4500
406
+ },
407
+ {
408
+ "epoch": 5.9445178335535,
409
+ "eval_bleu": 43.8559984358133,
410
+ "eval_char_accuracy": 44.56273646981411,
411
+ "eval_loss": 0.21132107079029083,
412
+ "eval_runtime": 319.4845,
413
+ "eval_samples_per_second": 4.739,
414
+ "eval_steps_per_second": 0.595,
415
+ "step": 4500
416
+ },
417
+ {
418
+ "epoch": 6.07661822985469,
419
+ "grad_norm": 2.289716958999634,
420
+ "learning_rate": 9.94299306316567e-06,
421
+ "loss": 0.2938,
422
+ "step": 4600
423
+ },
424
+ {
425
+ "epoch": 6.208718626155878,
426
+ "grad_norm": 2.9307034015655518,
427
+ "learning_rate": 9.939783061455845e-06,
428
+ "loss": 0.2814,
429
+ "step": 4700
430
+ },
431
+ {
432
+ "epoch": 6.340819022457067,
433
+ "grad_norm": 2.3431613445281982,
434
+ "learning_rate": 9.936485689053462e-06,
435
+ "loss": 0.2782,
436
+ "step": 4800
437
+ },
438
+ {
439
+ "epoch": 6.472919418758257,
440
+ "grad_norm": 2.2339768409729004,
441
+ "learning_rate": 9.933101004279647e-06,
442
+ "loss": 0.2752,
443
+ "step": 4900
444
+ },
445
+ {
446
+ "epoch": 6.605019815059445,
447
+ "grad_norm": 2.076145887374878,
448
+ "learning_rate": 9.92962906699983e-06,
449
+ "loss": 0.265,
450
+ "step": 5000
451
+ },
452
+ {
453
+ "epoch": 6.605019815059445,
454
+ "eval_bleu": 47.09645003475602,
455
+ "eval_char_accuracy": 46.29205050172725,
456
+ "eval_loss": 0.1954895406961441,
457
+ "eval_runtime": 322.9917,
458
+ "eval_samples_per_second": 4.687,
459
+ "eval_steps_per_second": 0.588,
460
+ "step": 5000
461
+ },
462
+ {
463
+ "epoch": 6.737120211360634,
464
+ "grad_norm": 1.715720295906067,
465
+ "learning_rate": 9.926069938622698e-06,
466
+ "loss": 0.266,
467
+ "step": 5100
468
+ },
469
+ {
470
+ "epoch": 6.869220607661823,
471
+ "grad_norm": 2.501234531402588,
472
+ "learning_rate": 9.922423682099088e-06,
473
+ "loss": 0.2633,
474
+ "step": 5200
475
+ },
476
+ {
477
+ "epoch": 7.001321003963012,
478
+ "grad_norm": 1.929123044013977,
479
+ "learning_rate": 9.918690361920898e-06,
480
+ "loss": 0.2584,
481
+ "step": 5300
482
+ },
483
+ {
484
+ "epoch": 7.133421400264201,
485
+ "grad_norm": 2.0370264053344727,
486
+ "learning_rate": 9.914870044119924e-06,
487
+ "loss": 0.2451,
488
+ "step": 5400
489
+ },
490
+ {
491
+ "epoch": 7.265521796565389,
492
+ "grad_norm": 2.3562278747558594,
493
+ "learning_rate": 9.91096279626671e-06,
494
+ "loss": 0.2476,
495
+ "step": 5500
496
+ },
497
+ {
498
+ "epoch": 7.265521796565389,
499
+ "eval_bleu": 48.90758910970442,
500
+ "eval_char_accuracy": 46.93308932390195,
501
+ "eval_loss": 0.18339309096336365,
502
+ "eval_runtime": 328.302,
503
+ "eval_samples_per_second": 4.612,
504
+ "eval_steps_per_second": 0.579,
505
+ "step": 5500
506
+ },
507
+ {
508
+ "epoch": 7.397622192866579,
509
+ "grad_norm": 2.410529613494873,
510
+ "learning_rate": 9.90696868746934e-06,
511
+ "loss": 0.2419,
512
+ "step": 5600
513
+ },
514
+ {
515
+ "epoch": 7.5297225891677675,
516
+ "grad_norm": 1.685939908027649,
517
+ "learning_rate": 9.902887788372223e-06,
518
+ "loss": 0.2448,
519
+ "step": 5700
520
+ },
521
+ {
522
+ "epoch": 7.661822985468956,
523
+ "grad_norm": 2.3180549144744873,
524
+ "learning_rate": 9.89872017115484e-06,
525
+ "loss": 0.2379,
526
+ "step": 5800
527
+ },
528
+ {
529
+ "epoch": 7.793923381770146,
530
+ "grad_norm": 2.4159021377563477,
531
+ "learning_rate": 9.894465909530471e-06,
532
+ "loss": 0.2339,
533
+ "step": 5900
534
+ },
535
+ {
536
+ "epoch": 7.926023778071334,
537
+ "grad_norm": 2.0757477283477783,
538
+ "learning_rate": 9.890125078744884e-06,
539
+ "loss": 0.2356,
540
+ "step": 6000
541
+ },
542
+ {
543
+ "epoch": 7.926023778071334,
544
+ "eval_bleu": 51.273473767746786,
545
+ "eval_char_accuracy": 49.088563086033886,
546
+ "eval_loss": 0.1728673279285431,
547
+ "eval_runtime": 317.183,
548
+ "eval_samples_per_second": 4.773,
549
+ "eval_steps_per_second": 0.599,
550
+ "step": 6000
551
+ },
552
+ {
553
+ "epoch": 8.058124174372523,
554
+ "grad_norm": 1.9405860900878906,
555
+ "learning_rate": 9.885697755575015e-06,
556
+ "loss": 0.2251,
557
+ "step": 6100
558
+ },
559
+ {
560
+ "epoch": 8.190224570673712,
561
+ "grad_norm": 1.7473342418670654,
562
+ "learning_rate": 9.881184018327597e-06,
563
+ "loss": 0.2195,
564
+ "step": 6200
565
+ },
566
+ {
567
+ "epoch": 8.3223249669749,
568
+ "grad_norm": 1.7633724212646484,
569
+ "learning_rate": 9.876583946837787e-06,
570
+ "loss": 0.219,
571
+ "step": 6300
572
+ },
573
+ {
574
+ "epoch": 8.45442536327609,
575
+ "grad_norm": 2.1117053031921387,
576
+ "learning_rate": 9.871897622467748e-06,
577
+ "loss": 0.2148,
578
+ "step": 6400
579
+ },
580
+ {
581
+ "epoch": 8.58652575957728,
582
+ "grad_norm": 2.114854574203491,
583
+ "learning_rate": 9.867125128105211e-06,
584
+ "loss": 0.2222,
585
+ "step": 6500
586
+ },
587
+ {
588
+ "epoch": 8.58652575957728,
589
+ "eval_bleu": 53.15738681385049,
590
+ "eval_char_accuracy": 49.680251686132586,
591
+ "eval_loss": 0.1632871925830841,
592
+ "eval_runtime": 325.9103,
593
+ "eval_samples_per_second": 4.645,
594
+ "eval_steps_per_second": 0.583,
595
+ "step": 6500
596
+ },
597
+ {
598
+ "epoch": 8.718626155878468,
599
+ "grad_norm": 2.6163322925567627,
600
+ "learning_rate": 9.862266548162008e-06,
601
+ "loss": 0.2141,
602
+ "step": 6600
603
+ },
604
+ {
605
+ "epoch": 8.850726552179657,
606
+ "grad_norm": 2.1705501079559326,
607
+ "learning_rate": 9.857321968572577e-06,
608
+ "loss": 0.2126,
609
+ "step": 6700
610
+ },
611
+ {
612
+ "epoch": 8.982826948480845,
613
+ "grad_norm": 2.4352259635925293,
614
+ "learning_rate": 9.85229147679245e-06,
615
+ "loss": 0.2124,
616
+ "step": 6800
617
+ },
618
+ {
619
+ "epoch": 9.114927344782034,
620
+ "grad_norm": 1.912975788116455,
621
+ "learning_rate": 9.847175161796696e-06,
622
+ "loss": 0.2032,
623
+ "step": 6900
624
+ },
625
+ {
626
+ "epoch": 9.247027741083222,
627
+ "grad_norm": 1.7575359344482422,
628
+ "learning_rate": 9.841973114078358e-06,
629
+ "loss": 0.2005,
630
+ "step": 7000
631
+ },
632
+ {
633
+ "epoch": 9.247027741083222,
634
+ "eval_bleu": 54.81436388056976,
635
+ "eval_char_accuracy": 50.20562592531667,
636
+ "eval_loss": 0.1546466052532196,
637
+ "eval_runtime": 319.5902,
638
+ "eval_samples_per_second": 4.737,
639
+ "eval_steps_per_second": 0.595,
640
+ "step": 7000
641
+ },
642
+ {
643
+ "epoch": 9.379128137384413,
644
+ "grad_norm": 1.6994798183441162,
645
+ "learning_rate": 9.836685425646842e-06,
646
+ "loss": 0.1929,
647
+ "step": 7100
648
+ },
649
+ {
650
+ "epoch": 9.511228533685602,
651
+ "grad_norm": 1.8375500440597534,
652
+ "learning_rate": 9.831312190026295e-06,
653
+ "loss": 0.1954,
654
+ "step": 7200
655
+ },
656
+ {
657
+ "epoch": 9.64332892998679,
658
+ "grad_norm": 2.735320568084717,
659
+ "learning_rate": 9.825853502253951e-06,
660
+ "loss": 0.1949,
661
+ "step": 7300
662
+ },
663
+ {
664
+ "epoch": 9.775429326287979,
665
+ "grad_norm": 1.9880143404006958,
666
+ "learning_rate": 9.820309458878447e-06,
667
+ "loss": 0.196,
668
+ "step": 7400
669
+ },
670
+ {
671
+ "epoch": 9.907529722589167,
672
+ "grad_norm": 3.1160881519317627,
673
+ "learning_rate": 9.814680157958122e-06,
674
+ "loss": 0.1957,
675
+ "step": 7500
676
+ },
677
+ {
678
+ "epoch": 9.907529722589167,
679
+ "eval_bleu": 56.684180257446236,
680
+ "eval_char_accuracy": 51.796656522454356,
681
+ "eval_loss": 0.14744216203689575,
682
+ "eval_runtime": 318.7305,
683
+ "eval_samples_per_second": 4.75,
684
+ "eval_steps_per_second": 0.596,
685
+ "step": 7500
686
+ },
687
+ {
688
+ "epoch": 10.039630118890356,
689
+ "grad_norm": 1.975994348526001,
690
+ "learning_rate": 9.808965699059276e-06,
691
+ "loss": 0.1964,
692
+ "step": 7600
693
+ },
694
+ {
695
+ "epoch": 10.171730515191545,
696
+ "grad_norm": 1.6857510805130005,
697
+ "learning_rate": 9.80316618325441e-06,
698
+ "loss": 0.1832,
699
+ "step": 7700
700
+ },
701
+ {
702
+ "epoch": 10.303830911492735,
703
+ "grad_norm": 1.6473827362060547,
704
+ "learning_rate": 9.797281713120438e-06,
705
+ "loss": 0.1846,
706
+ "step": 7800
707
+ },
708
+ {
709
+ "epoch": 10.435931307793924,
710
+ "grad_norm": 2.1330363750457764,
711
+ "learning_rate": 9.79131239273688e-06,
712
+ "loss": 0.1783,
713
+ "step": 7900
714
+ },
715
+ {
716
+ "epoch": 10.568031704095112,
717
+ "grad_norm": 2.0598771572113037,
718
+ "learning_rate": 9.785258327684007e-06,
719
+ "loss": 0.183,
720
+ "step": 8000
721
+ },
722
+ {
723
+ "epoch": 10.568031704095112,
724
+ "eval_bleu": 57.92928420103779,
725
+ "eval_char_accuracy": 51.580235236058556,
726
+ "eval_loss": 0.1413801610469818,
727
+ "eval_runtime": 318.3694,
728
+ "eval_samples_per_second": 4.755,
729
+ "eval_steps_per_second": 0.597,
730
+ "step": 8000
731
+ },
732
+ {
733
+ "epoch": 10.700132100396301,
734
+ "grad_norm": 1.8669029474258423,
735
+ "learning_rate": 9.779119625040988e-06,
736
+ "loss": 0.1801,
737
+ "step": 8100
738
+ },
739
+ {
740
+ "epoch": 10.83223249669749,
741
+ "grad_norm": 1.967623233795166,
742
+ "learning_rate": 9.772896393383991e-06,
743
+ "loss": 0.1772,
744
+ "step": 8200
745
+ },
746
+ {
747
+ "epoch": 10.964332892998678,
748
+ "grad_norm": 2.0888118743896484,
749
+ "learning_rate": 9.766588742784255e-06,
750
+ "loss": 0.1741,
751
+ "step": 8300
752
+ },
753
+ {
754
+ "epoch": 11.096433289299869,
755
+ "grad_norm": 1.8615264892578125,
756
+ "learning_rate": 9.760196784806155e-06,
757
+ "loss": 0.1733,
758
+ "step": 8400
759
+ },
760
+ {
761
+ "epoch": 11.228533685601057,
762
+ "grad_norm": 1.786023736000061,
763
+ "learning_rate": 9.753720632505219e-06,
764
+ "loss": 0.171,
765
+ "step": 8500
766
+ },
767
+ {
768
+ "epoch": 11.228533685601057,
769
+ "eval_bleu": 58.97794662773473,
770
+ "eval_char_accuracy": 53.53111120250041,
771
+ "eval_loss": 0.13721999526023865,
772
+ "eval_runtime": 310.3442,
773
+ "eval_samples_per_second": 4.878,
774
+ "eval_steps_per_second": 0.612,
775
+ "step": 8500
776
+ },
777
+ {
778
+ "epoch": 11.360634081902246,
779
+ "grad_norm": 2.22037935256958,
780
+ "learning_rate": 9.74716040042614e-06,
781
+ "loss": 0.169,
782
+ "step": 8600
783
+ },
784
+ {
785
+ "epoch": 11.492734478203435,
786
+ "grad_norm": 2.2769389152526855,
787
+ "learning_rate": 9.740516204600734e-06,
788
+ "loss": 0.1631,
789
+ "step": 8700
790
+ },
791
+ {
792
+ "epoch": 11.624834874504623,
793
+ "grad_norm": 2.1513566970825195,
794
+ "learning_rate": 9.733788162545902e-06,
795
+ "loss": 0.1669,
796
+ "step": 8800
797
+ },
798
+ {
799
+ "epoch": 11.756935270805812,
800
+ "grad_norm": 1.640358328819275,
801
+ "learning_rate": 9.726976393261547e-06,
802
+ "loss": 0.1674,
803
+ "step": 8900
804
+ },
805
+ {
806
+ "epoch": 11.889035667107,
807
+ "grad_norm": 1.6976934671401978,
808
+ "learning_rate": 9.720081017228462e-06,
809
+ "loss": 0.1646,
810
+ "step": 9000
811
+ },
812
+ {
813
+ "epoch": 11.889035667107,
814
+ "eval_bleu": 60.38328284328657,
815
+ "eval_char_accuracy": 54.70215084717881,
816
+ "eval_loss": 0.13088105618953705,
817
+ "eval_runtime": 316.5706,
818
+ "eval_samples_per_second": 4.783,
819
+ "eval_steps_per_second": 0.6,
820
+ "step": 9000
821
+ },
822
+ {
823
+ "epoch": 12.021136063408191,
824
+ "grad_norm": 1.7420779466629028,
825
+ "learning_rate": 9.713102156406213e-06,
826
+ "loss": 0.1629,
827
+ "step": 9100
828
+ },
829
+ {
830
+ "epoch": 12.15323645970938,
831
+ "grad_norm": 1.9723796844482422,
832
+ "learning_rate": 9.706039934230967e-06,
833
+ "loss": 0.1578,
834
+ "step": 9200
835
+ },
836
+ {
837
+ "epoch": 12.285336856010568,
838
+ "grad_norm": 2.3517324924468994,
839
+ "learning_rate": 9.698894475613323e-06,
840
+ "loss": 0.1561,
841
+ "step": 9300
842
+ },
843
+ {
844
+ "epoch": 12.417437252311757,
845
+ "grad_norm": 1.5132865905761719,
846
+ "learning_rate": 9.691665906936088e-06,
847
+ "loss": 0.157,
848
+ "step": 9400
849
+ },
850
+ {
851
+ "epoch": 12.549537648612946,
852
+ "grad_norm": 2.435624599456787,
853
+ "learning_rate": 9.684354356052055e-06,
854
+ "loss": 0.1538,
855
+ "step": 9500
856
+ },
857
+ {
858
+ "epoch": 12.549537648612946,
859
+ "eval_bleu": 61.08870144349316,
860
+ "eval_char_accuracy": 55.10055107747986,
861
+ "eval_loss": 0.12675440311431885,
862
+ "eval_runtime": 314.3152,
863
+ "eval_samples_per_second": 4.817,
864
+ "eval_steps_per_second": 0.604,
865
+ "step": 9500
866
+ },
867
+ {
868
+ "epoch": 12.681638044914134,
869
+ "grad_norm": 2.092256784439087,
870
+ "learning_rate": 9.676959952281733e-06,
871
+ "loss": 0.1518,
872
+ "step": 9600
873
+ },
874
+ {
875
+ "epoch": 12.813738441215325,
876
+ "grad_norm": 1.7985179424285889,
877
+ "learning_rate": 9.669482826411065e-06,
878
+ "loss": 0.158,
879
+ "step": 9700
880
+ },
881
+ {
882
+ "epoch": 12.945838837516513,
883
+ "grad_norm": 1.7571889162063599,
884
+ "learning_rate": 9.66192311068911e-06,
885
+ "loss": 0.152,
886
+ "step": 9800
887
+ },
888
+ {
889
+ "epoch": 13.077939233817702,
890
+ "grad_norm": 1.3526843786239624,
891
+ "learning_rate": 9.654280938825705e-06,
892
+ "loss": 0.1426,
893
+ "step": 9900
894
+ },
895
+ {
896
+ "epoch": 13.21003963011889,
897
+ "grad_norm": 1.6651784181594849,
898
+ "learning_rate": 9.646556445989106e-06,
899
+ "loss": 0.1476,
900
+ "step": 10000
901
+ },
902
+ {
903
+ "epoch": 13.21003963011889,
904
+ "eval_bleu": 61.91608356246592,
905
+ "eval_char_accuracy": 55.006477216647475,
906
+ "eval_loss": 0.12389995902776718,
907
+ "eval_runtime": 315.2745,
908
+ "eval_samples_per_second": 4.802,
909
+ "eval_steps_per_second": 0.603,
910
+ "step": 10000
911
+ },
912
+ {
913
+ "epoch": 13.34214002642008,
914
+ "grad_norm": 1.5739635229110718,
915
+ "learning_rate": 9.63874976880359e-06,
916
+ "loss": 0.148,
917
+ "step": 10100
918
+ },
919
+ {
920
+ "epoch": 13.474240422721268,
921
+ "grad_norm": 1.779477834701538,
922
+ "learning_rate": 9.63086104534704e-06,
923
+ "loss": 0.1469,
924
+ "step": 10200
925
+ },
926
+ {
927
+ "epoch": 13.606340819022456,
928
+ "grad_norm": 1.5422449111938477,
929
+ "learning_rate": 9.622890415148505e-06,
930
+ "loss": 0.143,
931
+ "step": 10300
932
+ },
933
+ {
934
+ "epoch": 13.738441215323647,
935
+ "grad_norm": 1.80446457862854,
936
+ "learning_rate": 9.61483801918573e-06,
937
+ "loss": 0.1424,
938
+ "step": 10400
939
+ },
940
+ {
941
+ "epoch": 13.870541611624835,
942
+ "grad_norm": 1.7641818523406982,
943
+ "learning_rate": 9.606703999882667e-06,
944
+ "loss": 0.1406,
945
+ "step": 10500
946
+ },
947
+ {
948
+ "epoch": 13.870541611624835,
949
+ "eval_bleu": 62.97532147351367,
950
+ "eval_char_accuracy": 56.364636453364035,
951
+ "eval_loss": 0.12048687040805817,
952
+ "eval_runtime": 315.4459,
953
+ "eval_samples_per_second": 4.8,
954
+ "eval_steps_per_second": 0.602,
955
+ "step": 10500
956
+ },
957
+ {
958
+ "epoch": 14.002642007926024,
959
+ "grad_norm": 1.5506337881088257,
960
+ "learning_rate": 9.598488501106947e-06,
961
+ "loss": 0.1436,
962
+ "step": 10600
963
+ },
964
+ {
965
+ "epoch": 14.134742404227213,
966
+ "grad_norm": 1.338537335395813,
967
+ "learning_rate": 9.590191668167343e-06,
968
+ "loss": 0.1396,
969
+ "step": 10700
970
+ },
971
+ {
972
+ "epoch": 14.266842800528401,
973
+ "grad_norm": 1.9432475566864014,
974
+ "learning_rate": 9.581813647811199e-06,
975
+ "loss": 0.1427,
976
+ "step": 10800
977
+ },
978
+ {
979
+ "epoch": 14.39894319682959,
980
+ "grad_norm": 1.8222265243530273,
981
+ "learning_rate": 9.573354588221833e-06,
982
+ "loss": 0.1352,
983
+ "step": 10900
984
+ },
985
+ {
986
+ "epoch": 14.531043593130779,
987
+ "grad_norm": 1.7141766548156738,
988
+ "learning_rate": 9.564814639015915e-06,
989
+ "loss": 0.1361,
990
+ "step": 11000
991
+ },
992
+ {
993
+ "epoch": 14.531043593130779,
994
+ "eval_bleu": 63.45481969831487,
995
+ "eval_char_accuracy": 56.61755634150354,
996
+ "eval_loss": 0.11702162027359009,
997
+ "eval_runtime": 318.0259,
998
+ "eval_samples_per_second": 4.761,
999
+ "eval_steps_per_second": 0.597,
1000
+ "step": 11000
1001
+ },
1002
+ {
1003
+ "epoch": 14.663143989431969,
1004
+ "grad_norm": 1.473866581916809,
1005
+ "learning_rate": 9.556193951240821e-06,
1006
+ "loss": 0.1302,
1007
+ "step": 11100
1008
+ },
1009
+ {
1010
+ "epoch": 14.795244385733158,
1011
+ "grad_norm": 1.3840774297714233,
1012
+ "learning_rate": 9.547492677371968e-06,
1013
+ "loss": 0.1355,
1014
+ "step": 11200
1015
+ },
1016
+ {
1017
+ "epoch": 14.927344782034346,
1018
+ "grad_norm": 1.8060030937194824,
1019
+ "learning_rate": 9.538710971310104e-06,
1020
+ "loss": 0.1332,
1021
+ "step": 11300
1022
+ },
1023
+ {
1024
+ "epoch": 15.059445178335535,
1025
+ "grad_norm": 1.8242031335830688,
1026
+ "learning_rate": 9.529848988378597e-06,
1027
+ "loss": 0.1247,
1028
+ "step": 11400
1029
+ },
1030
+ {
1031
+ "epoch": 15.191545574636724,
1032
+ "grad_norm": 1.3062950372695923,
1033
+ "learning_rate": 9.520906885320682e-06,
1034
+ "loss": 0.1295,
1035
+ "step": 11500
1036
+ },
1037
+ {
1038
+ "epoch": 15.191545574636724,
1039
+ "eval_bleu": 64.22910948855701,
1040
+ "eval_char_accuracy": 57.40099111696003,
1041
+ "eval_loss": 0.11363621801137924,
1042
+ "eval_runtime": 315.2122,
1043
+ "eval_samples_per_second": 4.803,
1044
+ "eval_steps_per_second": 0.603,
1045
+ "step": 11500
1046
+ },
1047
+ {
1048
+ "epoch": 15.323645970937912,
1049
+ "grad_norm": 1.5021114349365234,
1050
+ "learning_rate": 9.511884820296695e-06,
1051
+ "loss": 0.1292,
1052
+ "step": 11600
1053
+ },
1054
+ {
1055
+ "epoch": 15.455746367239101,
1056
+ "grad_norm": 1.8947999477386475,
1057
+ "learning_rate": 9.502782952881268e-06,
1058
+ "loss": 0.128,
1059
+ "step": 11700
1060
+ },
1061
+ {
1062
+ "epoch": 15.587846763540291,
1063
+ "grad_norm": 1.5116643905639648,
1064
+ "learning_rate": 9.493601444060514e-06,
1065
+ "loss": 0.1276,
1066
+ "step": 11800
1067
+ },
1068
+ {
1069
+ "epoch": 15.71994715984148,
1070
+ "grad_norm": 1.5134871006011963,
1071
+ "learning_rate": 9.48434045622917e-06,
1072
+ "loss": 0.132,
1073
+ "step": 11900
1074
+ },
1075
+ {
1076
+ "epoch": 15.852047556142669,
1077
+ "grad_norm": 1.7438267469406128,
1078
+ "learning_rate": 9.475000153187733e-06,
1079
+ "loss": 0.1243,
1080
+ "step": 12000
1081
+ },
1082
+ {
1083
+ "epoch": 15.852047556142669,
1084
+ "eval_bleu": 65.05289431113529,
1085
+ "eval_char_accuracy": 57.989081263365684,
1086
+ "eval_loss": 0.1115257665514946,
1087
+ "eval_runtime": 309.8437,
1088
+ "eval_samples_per_second": 4.886,
1089
+ "eval_steps_per_second": 0.613,
1090
+ "step": 12000
1091
+ },
1092
+ {
1093
+ "epoch": 15.984147952443857,
1094
+ "grad_norm": 1.913855791091919,
1095
+ "learning_rate": 9.46558070013956e-06,
1096
+ "loss": 0.1261,
1097
+ "step": 12100
1098
+ },
1099
+ {
1100
+ "epoch": 16.116248348745046,
1101
+ "grad_norm": 1.4257742166519165,
1102
+ "learning_rate": 9.456082263687946e-06,
1103
+ "loss": 0.117,
1104
+ "step": 12200
1105
+ },
1106
+ {
1107
+ "epoch": 16.248348745046236,
1108
+ "grad_norm": 1.49489164352417,
1109
+ "learning_rate": 9.44650501183318e-06,
1110
+ "loss": 0.1232,
1111
+ "step": 12300
1112
+ },
1113
+ {
1114
+ "epoch": 16.380449141347423,
1115
+ "grad_norm": 1.7977708578109741,
1116
+ "learning_rate": 9.436849113969567e-06,
1117
+ "loss": 0.1212,
1118
+ "step": 12400
1119
+ },
1120
+ {
1121
+ "epoch": 16.512549537648614,
1122
+ "grad_norm": 1.2978798151016235,
1123
+ "learning_rate": 9.427212472501483e-06,
1124
+ "loss": 0.122,
1125
+ "step": 12500
1126
+ },
1127
+ {
1128
+ "epoch": 16.512549537648614,
1129
+ "eval_bleu": 65.59844085659223,
1130
+ "eval_char_accuracy": 58.32784997532489,
1131
+ "eval_loss": 0.10947112739086151,
1132
+ "eval_runtime": 332.8226,
1133
+ "eval_samples_per_second": 4.549,
1134
+ "eval_steps_per_second": 0.571,
1135
+ "step": 12500
1136
+ },
1137
+ {
1138
+ "epoch": 16.6446499339498,
1139
+ "grad_norm": 1.5628472566604614,
1140
+ "learning_rate": 9.417400578537868e-06,
1141
+ "loss": 0.1219,
1142
+ "step": 12600
1143
+ },
1144
+ {
1145
+ "epoch": 16.77675033025099,
1146
+ "grad_norm": 1.5304769277572632,
1147
+ "learning_rate": 9.407510553339931e-06,
1148
+ "loss": 0.1192,
1149
+ "step": 12700
1150
+ },
1151
+ {
1152
+ "epoch": 16.90885072655218,
1153
+ "grad_norm": 1.6370787620544434,
1154
+ "learning_rate": 9.397542571834054e-06,
1155
+ "loss": 0.1181,
1156
+ "step": 12800
1157
+ },
1158
+ {
1159
+ "epoch": 17.040951122853368,
1160
+ "grad_norm": 1.5388678312301636,
1161
+ "learning_rate": 9.387496810325436e-06,
1162
+ "loss": 0.1137,
1163
+ "step": 12900
1164
+ },
1165
+ {
1166
+ "epoch": 17.17305151915456,
1167
+ "grad_norm": 1.8251888751983643,
1168
+ "learning_rate": 9.377373446494984e-06,
1169
+ "loss": 0.1122,
1170
+ "step": 13000
1171
+ },
1172
+ {
1173
+ "epoch": 17.17305151915456,
1174
+ "eval_bleu": 66.79852257149867,
1175
+ "eval_char_accuracy": 58.47692877117947,
1176
+ "eval_loss": 0.1069813147187233,
1177
+ "eval_runtime": 350.9901,
1178
+ "eval_samples_per_second": 4.314,
1179
+ "eval_steps_per_second": 0.541,
1180
+ "step": 13000
1181
+ },
1182
+ {
1183
+ "epoch": 17.305151915455745,
1184
+ "grad_norm": 1.6730031967163086,
1185
+ "learning_rate": 9.367172659396172e-06,
1186
+ "loss": 0.1123,
1187
+ "step": 13100
1188
+ },
1189
+ {
1190
+ "epoch": 17.437252311756936,
1191
+ "grad_norm": 1.2402830123901367,
1192
+ "learning_rate": 9.35689462945187e-06,
1193
+ "loss": 0.1125,
1194
+ "step": 13200
1195
+ },
1196
+ {
1197
+ "epoch": 17.569352708058123,
1198
+ "grad_norm": 1.9425466060638428,
1199
+ "learning_rate": 9.34653953845115e-06,
1200
+ "loss": 0.1109,
1201
+ "step": 13300
1202
+ },
1203
+ {
1204
+ "epoch": 17.701453104359313,
1205
+ "grad_norm": 3.105457067489624,
1206
+ "learning_rate": 9.33610756954608e-06,
1207
+ "loss": 0.1161,
1208
+ "step": 13400
1209
+ },
1210
+ {
1211
+ "epoch": 17.833553500660503,
1212
+ "grad_norm": 1.679442048072815,
1213
+ "learning_rate": 9.325598907248478e-06,
1214
+ "loss": 0.1131,
1215
+ "step": 13500
1216
+ },
1217
+ {
1218
+ "epoch": 17.833553500660503,
1219
+ "eval_bleu": 66.72469802657548,
1220
+ "eval_char_accuracy": 59.26910264846191,
1221
+ "eval_loss": 0.10426344722509384,
1222
+ "eval_runtime": 353.0152,
1223
+ "eval_samples_per_second": 4.289,
1224
+ "eval_steps_per_second": 0.538,
1225
+ "step": 13500
1226
+ },
1227
+ {
1228
+ "epoch": 17.96565389696169,
1229
+ "grad_norm": 1.1806349754333496,
1230
+ "learning_rate": 9.315013737426645e-06,
1231
+ "loss": 0.115,
1232
+ "step": 13600
1233
+ },
1234
+ {
1235
+ "epoch": 18.09775429326288,
1236
+ "grad_norm": 1.5437259674072266,
1237
+ "learning_rate": 9.304352247302091e-06,
1238
+ "loss": 0.1071,
1239
+ "step": 13700
1240
+ },
1241
+ {
1242
+ "epoch": 18.229854689564068,
1243
+ "grad_norm": 3.9584083557128906,
1244
+ "learning_rate": 9.293614625446205e-06,
1245
+ "loss": 0.11,
1246
+ "step": 13800
1247
+ },
1248
+ {
1249
+ "epoch": 18.361955085865258,
1250
+ "grad_norm": 1.4071292877197266,
1251
+ "learning_rate": 9.282801061776937e-06,
1252
+ "loss": 0.1093,
1253
+ "step": 13900
1254
+ },
1255
+ {
1256
+ "epoch": 18.494055482166445,
1257
+ "grad_norm": 1.599787950515747,
1258
+ "learning_rate": 9.271911747555425e-06,
1259
+ "loss": 0.1057,
1260
+ "step": 14000
1261
+ },
1262
+ {
1263
+ "epoch": 18.494055482166445,
1264
+ "eval_bleu": 67.26763967365643,
1265
+ "eval_char_accuracy": 58.966318473433134,
1266
+ "eval_loss": 0.10266197472810745,
1267
+ "eval_runtime": 318.6046,
1268
+ "eval_samples_per_second": 4.752,
1269
+ "eval_steps_per_second": 0.596,
1270
+ "step": 14000
1271
+ },
1272
+ {
1273
+ "epoch": 18.626155878467635,
1274
+ "grad_norm": 1.3243365287780762,
1275
+ "learning_rate": 9.260946875382624e-06,
1276
+ "loss": 0.1054,
1277
+ "step": 14100
1278
+ },
1279
+ {
1280
+ "epoch": 18.758256274768826,
1281
+ "grad_norm": 1.8453656435012817,
1282
+ "learning_rate": 9.249906639195894e-06,
1283
+ "loss": 0.1096,
1284
+ "step": 14200
1285
+ },
1286
+ {
1287
+ "epoch": 18.890356671070013,
1288
+ "grad_norm": 1.2308754920959473,
1289
+ "learning_rate": 9.238791234265565e-06,
1290
+ "loss": 0.1045,
1291
+ "step": 14300
1292
+ },
1293
+ {
1294
+ "epoch": 19.022457067371203,
1295
+ "grad_norm": 1.3015666007995605,
1296
+ "learning_rate": 9.22760085719149e-06,
1297
+ "loss": 0.1065,
1298
+ "step": 14400
1299
+ },
1300
+ {
1301
+ "epoch": 19.15455746367239,
1302
+ "grad_norm": 1.3568576574325562,
1303
+ "learning_rate": 9.21633570589957e-06,
1304
+ "loss": 0.1024,
1305
+ "step": 14500
1306
+ },
1307
+ {
1308
+ "epoch": 19.15455746367239,
1309
+ "eval_bleu": 67.65790078495372,
1310
+ "eval_char_accuracy": 59.99547622964303,
1311
+ "eval_loss": 0.10274580866098404,
1312
+ "eval_runtime": 317.6719,
1313
+ "eval_samples_per_second": 4.766,
1314
+ "eval_steps_per_second": 0.598,
1315
+ "step": 14500
1316
+ },
1317
+ {
1318
+ "epoch": 19.28665785997358,
1319
+ "grad_norm": 1.6260790824890137,
1320
+ "learning_rate": 9.204995979638241e-06,
1321
+ "loss": 0.1025,
1322
+ "step": 14600
1323
+ },
1324
+ {
1325
+ "epoch": 19.418758256274767,
1326
+ "grad_norm": 1.1852062940597534,
1327
+ "learning_rate": 9.193581878974964e-06,
1328
+ "loss": 0.101,
1329
+ "step": 14700
1330
+ },
1331
+ {
1332
+ "epoch": 19.550858652575958,
1333
+ "grad_norm": 2.5521767139434814,
1334
+ "learning_rate": 9.18209360579267e-06,
1335
+ "loss": 0.0999,
1336
+ "step": 14800
1337
+ },
1338
+ {
1339
+ "epoch": 19.682959048877148,
1340
+ "grad_norm": 1.0231155157089233,
1341
+ "learning_rate": 9.17053136328619e-06,
1342
+ "loss": 0.1055,
1343
+ "step": 14900
1344
+ },
1345
+ {
1346
+ "epoch": 19.815059445178335,
1347
+ "grad_norm": 1.3518396615982056,
1348
+ "learning_rate": 9.15889535595866e-06,
1349
+ "loss": 0.1014,
1350
+ "step": 15000
1351
+ },
1352
+ {
1353
+ "epoch": 19.815059445178335,
1354
+ "eval_bleu": 68.43528395628573,
1355
+ "eval_char_accuracy": 61.64870866918901,
1356
+ "eval_loss": 0.09970895200967789,
1357
+ "eval_runtime": 314.0763,
1358
+ "eval_samples_per_second": 4.82,
1359
+ "eval_steps_per_second": 0.605,
1360
+ "step": 15000
1361
+ },
1362
+ {
1363
+ "epoch": 19.947159841479525,
1364
+ "grad_norm": 1.2678903341293335,
1365
+ "learning_rate": 9.147185789617907e-06,
1366
+ "loss": 0.1005,
1367
+ "step": 15100
1368
+ },
1369
+ {
1370
+ "epoch": 20.079260237780712,
1371
+ "grad_norm": 1.2439242601394653,
1372
+ "learning_rate": 9.13540287137281e-06,
1373
+ "loss": 0.0989,
1374
+ "step": 15200
1375
+ },
1376
+ {
1377
+ "epoch": 20.211360634081903,
1378
+ "grad_norm": 1.3169798851013184,
1379
+ "learning_rate": 9.123546809629632e-06,
1380
+ "loss": 0.1006,
1381
+ "step": 15300
1382
+ },
1383
+ {
1384
+ "epoch": 20.34346103038309,
1385
+ "grad_norm": 1.3063526153564453,
1386
+ "learning_rate": 9.111617814088332e-06,
1387
+ "loss": 0.0966,
1388
+ "step": 15400
1389
+ },
1390
+ {
1391
+ "epoch": 20.47556142668428,
1392
+ "grad_norm": 1.8397212028503418,
1393
+ "learning_rate": 9.099616095738867e-06,
1394
+ "loss": 0.0965,
1395
+ "step": 15500
1396
+ },
1397
+ {
1398
+ "epoch": 20.47556142668428,
1399
+ "eval_bleu": 68.40618966106221,
1400
+ "eval_char_accuracy": 60.30751357131107,
1401
+ "eval_loss": 0.09965521842241287,
1402
+ "eval_runtime": 315.776,
1403
+ "eval_samples_per_second": 4.795,
1404
+ "eval_steps_per_second": 0.602,
1405
+ "step": 15500
1406
+ },
1407
+ {
1408
+ "epoch": 20.60766182298547,
1409
+ "grad_norm": 1.544118046760559,
1410
+ "learning_rate": 9.087541866857453e-06,
1411
+ "loss": 0.0954,
1412
+ "step": 15600
1413
+ },
1414
+ {
1415
+ "epoch": 20.739762219286657,
1416
+ "grad_norm": 1.5378237962722778,
1417
+ "learning_rate": 9.075395341002804e-06,
1418
+ "loss": 0.0975,
1419
+ "step": 15700
1420
+ },
1421
+ {
1422
+ "epoch": 20.871862615587848,
1423
+ "grad_norm": 1.0197510719299316,
1424
+ "learning_rate": 9.06317673301237e-06,
1425
+ "loss": 0.0964,
1426
+ "step": 15800
1427
+ },
1428
+ {
1429
+ "epoch": 21.003963011889034,
1430
+ "grad_norm": 0.9621543884277344,
1431
+ "learning_rate": 9.05088625899852e-06,
1432
+ "loss": 0.0925,
1433
+ "step": 15900
1434
+ },
1435
+ {
1436
+ "epoch": 21.136063408190225,
1437
+ "grad_norm": 1.356550931930542,
1438
+ "learning_rate": 9.038524136344736e-06,
1439
+ "loss": 0.0917,
1440
+ "step": 16000
1441
+ },
1442
+ {
1443
+ "epoch": 21.136063408190225,
1444
+ "eval_bleu": 69.05043210174456,
1445
+ "eval_char_accuracy": 60.879153643691396,
1446
+ "eval_loss": 0.0960288867354393,
1447
+ "eval_runtime": 317.6216,
1448
+ "eval_samples_per_second": 4.767,
1449
+ "eval_steps_per_second": 0.598,
1450
+ "step": 16000
1451
+ },
1452
+ {
1453
+ "epoch": 21.268163804491415,
1454
+ "grad_norm": 2.0014472007751465,
1455
+ "learning_rate": 9.026090583701755e-06,
1456
+ "loss": 0.0962,
1457
+ "step": 16100
1458
+ },
1459
+ {
1460
+ "epoch": 21.400264200792602,
1461
+ "grad_norm": 1.4762715101242065,
1462
+ "learning_rate": 9.013585820983713e-06,
1463
+ "loss": 0.0917,
1464
+ "step": 16200
1465
+ },
1466
+ {
1467
+ "epoch": 21.532364597093792,
1468
+ "grad_norm": 1.242245078086853,
1469
+ "learning_rate": 9.001010069364241e-06,
1470
+ "loss": 0.0907,
1471
+ "step": 16300
1472
+ },
1473
+ {
1474
+ "epoch": 21.66446499339498,
1475
+ "grad_norm": 1.9390721321105957,
1476
+ "learning_rate": 8.98836355127257e-06,
1477
+ "loss": 0.0918,
1478
+ "step": 16400
1479
+ },
1480
+ {
1481
+ "epoch": 21.79656538969617,
1482
+ "grad_norm": 1.070357322692871,
1483
+ "learning_rate": 8.975646490389581e-06,
1484
+ "loss": 0.0903,
1485
+ "step": 16500
1486
+ },
1487
+ {
1488
+ "epoch": 21.79656538969617,
1489
+ "eval_bleu": 69.68531857632678,
1490
+ "eval_char_accuracy": 62.48920463892087,
1491
+ "eval_loss": 0.09554192423820496,
1492
+ "eval_runtime": 335.0637,
1493
+ "eval_samples_per_second": 4.519,
1494
+ "eval_steps_per_second": 0.567,
1495
+ "step": 16500
1496
+ },
1497
+ {
1498
+ "epoch": 21.928665785997357,
1499
+ "grad_norm": 1.8990777730941772,
1500
+ "learning_rate": 8.962859111643862e-06,
1501
+ "loss": 0.0946,
1502
+ "step": 16600
1503
+ },
1504
+ {
1505
+ "epoch": 22.060766182298547,
1506
+ "grad_norm": 1.6244333982467651,
1507
+ "learning_rate": 8.950001641207719e-06,
1508
+ "loss": 0.0895,
1509
+ "step": 16700
1510
+ },
1511
+ {
1512
+ "epoch": 22.192866578599737,
1513
+ "grad_norm": 1.6511205434799194,
1514
+ "learning_rate": 8.937074306493187e-06,
1515
+ "loss": 0.0907,
1516
+ "step": 16800
1517
+ },
1518
+ {
1519
+ "epoch": 22.324966974900924,
1520
+ "grad_norm": 1.421342372894287,
1521
+ "learning_rate": 8.924077336147992e-06,
1522
+ "loss": 0.0864,
1523
+ "step": 16900
1524
+ },
1525
+ {
1526
+ "epoch": 22.457067371202115,
1527
+ "grad_norm": 1.572800874710083,
1528
+ "learning_rate": 8.911010960051522e-06,
1529
+ "loss": 0.088,
1530
+ "step": 17000
1531
+ },
1532
+ {
1533
+ "epoch": 22.457067371202115,
1534
+ "eval_bleu": 69.94746089002493,
1535
+ "eval_char_accuracy": 61.5572051324231,
1536
+ "eval_loss": 0.09443064033985138,
1537
+ "eval_runtime": 345.5738,
1538
+ "eval_samples_per_second": 4.381,
1539
+ "eval_steps_per_second": 0.55,
1540
+ "step": 17000
1541
+ },
1542
+ {
1543
+ "epoch": 22.5891677675033,
1544
+ "grad_norm": 1.2412383556365967,
1545
+ "learning_rate": 8.897875409310755e-06,
1546
+ "loss": 0.085,
1547
+ "step": 17100
1548
+ },
1549
+ {
1550
+ "epoch": 22.721268163804492,
1551
+ "grad_norm": 1.4429903030395508,
1552
+ "learning_rate": 8.884803301685314e-06,
1553
+ "loss": 0.0908,
1554
+ "step": 17200
1555
+ },
1556
+ {
1557
+ "epoch": 22.85336856010568,
1558
+ "grad_norm": 1.1933690309524536,
1559
+ "learning_rate": 8.871530785794356e-06,
1560
+ "loss": 0.092,
1561
+ "step": 17300
1562
+ },
1563
+ {
1564
+ "epoch": 22.98546895640687,
1565
+ "grad_norm": 1.399186611175537,
1566
+ "learning_rate": 8.85818979355093e-06,
1567
+ "loss": 0.0837,
1568
+ "step": 17400
1569
+ },
1570
+ {
1571
+ "epoch": 23.11756935270806,
1572
+ "grad_norm": 1.2939783334732056,
1573
+ "learning_rate": 8.844780560919194e-06,
1574
+ "loss": 0.0871,
1575
+ "step": 17500
1576
+ },
1577
+ {
1578
+ "epoch": 23.11756935270806,
1579
+ "eval_bleu": 70.43896205976631,
1580
+ "eval_char_accuracy": 61.825032900148045,
1581
+ "eval_loss": 0.0947960913181305,
1582
+ "eval_runtime": 324.1925,
1583
+ "eval_samples_per_second": 4.67,
1584
+ "eval_steps_per_second": 0.586,
1585
+ "step": 17500
1586
+ },
1587
+ {
1588
+ "epoch": 23.249669749009247,
1589
+ "grad_norm": 1.2098222970962524,
1590
+ "learning_rate": 8.831303325070279e-06,
1591
+ "loss": 0.0827,
1592
+ "step": 17600
1593
+ },
1594
+ {
1595
+ "epoch": 23.381770145310437,
1596
+ "grad_norm": 1.5045851469039917,
1597
+ "learning_rate": 8.8177583243781e-06,
1598
+ "loss": 0.0838,
1599
+ "step": 17700
1600
+ },
1601
+ {
1602
+ "epoch": 23.513870541611624,
1603
+ "grad_norm": 1.5295897722244263,
1604
+ "learning_rate": 8.80414579841514e-06,
1605
+ "loss": 0.0858,
1606
+ "step": 17800
1607
+ },
1608
+ {
1609
+ "epoch": 23.645970937912814,
1610
+ "grad_norm": 1.4860461950302124,
1611
+ "learning_rate": 8.790465987948212e-06,
1612
+ "loss": 0.0875,
1613
+ "step": 17900
1614
+ },
1615
+ {
1616
+ "epoch": 23.778071334214,
1617
+ "grad_norm": 1.4711731672286987,
1618
+ "learning_rate": 8.776719134934199e-06,
1619
+ "loss": 0.0828,
1620
+ "step": 18000
1621
+ },
1622
+ {
1623
+ "epoch": 23.778071334214,
1624
+ "eval_bleu": 70.5623767627479,
1625
+ "eval_char_accuracy": 62.505140648132915,
1626
+ "eval_loss": 0.0924154594540596,
1627
+ "eval_runtime": 342.2412,
1628
+ "eval_samples_per_second": 4.424,
1629
+ "eval_steps_per_second": 0.555,
1630
+ "step": 18000
1631
+ },
1632
+ {
1633
+ "epoch": 23.91017173051519,
1634
+ "grad_norm": 1.4447731971740723,
1635
+ "learning_rate": 8.762905482515775e-06,
1636
+ "loss": 0.0814,
1637
+ "step": 18100
1638
+ },
1639
+ {
1640
+ "epoch": 24.042272126816382,
1641
+ "grad_norm": 1.350907802581787,
1642
+ "learning_rate": 8.749025275017107e-06,
1643
+ "loss": 0.0806,
1644
+ "step": 18200
1645
+ },
1646
+ {
1647
+ "epoch": 24.17437252311757,
1648
+ "grad_norm": 1.7207551002502441,
1649
+ "learning_rate": 8.735078757939532e-06,
1650
+ "loss": 0.08,
1651
+ "step": 18300
1652
+ },
1653
+ {
1654
+ "epoch": 24.30647291941876,
1655
+ "grad_norm": 1.0851505994796753,
1656
+ "learning_rate": 8.721066177957213e-06,
1657
+ "loss": 0.0779,
1658
+ "step": 18400
1659
+ },
1660
+ {
1661
+ "epoch": 24.438573315719946,
1662
+ "grad_norm": 1.2182328701019287,
1663
+ "learning_rate": 8.70698778291278e-06,
1664
+ "loss": 0.0814,
1665
+ "step": 18500
1666
+ },
1667
+ {
1668
+ "epoch": 24.438573315719946,
1669
+ "eval_bleu": 70.99257164437572,
1670
+ "eval_char_accuracy": 62.22189093600922,
1671
+ "eval_loss": 0.09152651578187943,
1672
+ "eval_runtime": 323.7174,
1673
+ "eval_samples_per_second": 4.677,
1674
+ "eval_steps_per_second": 0.587,
1675
+ "step": 18500
1676
+ },
1677
+ {
1678
+ "epoch": 24.570673712021136,
1679
+ "grad_norm": 2.0363285541534424,
1680
+ "learning_rate": 8.69284382181294e-06,
1681
+ "loss": 0.0821,
1682
+ "step": 18600
1683
+ },
1684
+ {
1685
+ "epoch": 24.702774108322323,
1686
+ "grad_norm": 1.3864432573318481,
1687
+ "learning_rate": 8.67863454482408e-06,
1688
+ "loss": 0.0809,
1689
+ "step": 18700
1690
+ },
1691
+ {
1692
+ "epoch": 24.834874504623514,
1693
+ "grad_norm": 2.032351493835449,
1694
+ "learning_rate": 8.664360203267838e-06,
1695
+ "loss": 0.0819,
1696
+ "step": 18800
1697
+ },
1698
+ {
1699
+ "epoch": 24.966974900924704,
1700
+ "grad_norm": 1.2007182836532593,
1701
+ "learning_rate": 8.65002104961666e-06,
1702
+ "loss": 0.0819,
1703
+ "step": 18900
1704
+ },
1705
+ {
1706
+ "epoch": 25.09907529722589,
1707
+ "grad_norm": 1.167693853378296,
1708
+ "learning_rate": 8.635617337489331e-06,
1709
+ "loss": 0.0778,
1710
+ "step": 19000
1711
+ },
1712
+ {
1713
+ "epoch": 25.09907529722589,
1714
+ "eval_bleu": 70.99070971451881,
1715
+ "eval_char_accuracy": 63.15440450732028,
1716
+ "eval_loss": 0.09109245985746384,
1717
+ "eval_runtime": 331.1976,
1718
+ "eval_samples_per_second": 4.571,
1719
+ "eval_steps_per_second": 0.574,
1720
+ "step": 19000
1721
+ },
1722
+ {
1723
+ "epoch": 25.23117569352708,
1724
+ "grad_norm": 1.9939295053482056,
1725
+ "learning_rate": 8.621149321646495e-06,
1726
+ "loss": 0.076,
1727
+ "step": 19100
1728
+ },
1729
+ {
1730
+ "epoch": 25.36327608982827,
1731
+ "grad_norm": 1.148555874824524,
1732
+ "learning_rate": 8.60661725798614e-06,
1733
+ "loss": 0.078,
1734
+ "step": 19200
1735
+ },
1736
+ {
1737
+ "epoch": 25.49537648612946,
1738
+ "grad_norm": 1.159621238708496,
1739
+ "learning_rate": 8.592167677001219e-06,
1740
+ "loss": 0.0823,
1741
+ "step": 19300
1742
+ },
1743
+ {
1744
+ "epoch": 25.627476882430646,
1745
+ "grad_norm": 1.7136179208755493,
1746
+ "learning_rate": 8.57750892397125e-06,
1747
+ "loss": 0.0755,
1748
+ "step": 19400
1749
+ },
1750
+ {
1751
+ "epoch": 25.759577278731836,
1752
+ "grad_norm": 1.105714201927185,
1753
+ "learning_rate": 8.5627868949981e-06,
1754
+ "loss": 0.0756,
1755
+ "step": 19500
1756
+ },
1757
+ {
1758
+ "epoch": 25.759577278731836,
1759
+ "eval_bleu": 71.15813151741806,
1760
+ "eval_char_accuracy": 63.43405576575094,
1761
+ "eval_loss": 0.09030098468065262,
1762
+ "eval_runtime": 316.3492,
1763
+ "eval_samples_per_second": 4.786,
1764
+ "eval_steps_per_second": 0.601,
1765
+ "step": 19500
1766
+ },
1767
+ {
1768
+ "epoch": 25.891677675033026,
1769
+ "grad_norm": 1.6091099977493286,
1770
+ "learning_rate": 8.548001850472529e-06,
1771
+ "loss": 0.0778,
1772
+ "step": 19600
1773
+ },
1774
+ {
1775
+ "epoch": 26.023778071334213,
1776
+ "grad_norm": 1.1730881929397583,
1777
+ "learning_rate": 8.533154051899864e-06,
1778
+ "loss": 0.0787,
1779
+ "step": 19700
1780
+ },
1781
+ {
1782
+ "epoch": 26.155878467635404,
1783
+ "grad_norm": 1.2703460454940796,
1784
+ "learning_rate": 8.518243761895369e-06,
1785
+ "loss": 0.0711,
1786
+ "step": 19800
1787
+ },
1788
+ {
1789
+ "epoch": 26.28797886393659,
1790
+ "grad_norm": 1.3379662036895752,
1791
+ "learning_rate": 8.503271244179608e-06,
1792
+ "loss": 0.075,
1793
+ "step": 19900
1794
+ },
1795
+ {
1796
+ "epoch": 26.42007926023778,
1797
+ "grad_norm": 1.370871901512146,
1798
+ "learning_rate": 8.488236763573772e-06,
1799
+ "loss": 0.0717,
1800
+ "step": 20000
1801
+ },
1802
+ {
1803
+ "epoch": 26.42007926023778,
1804
+ "eval_bleu": 72.10017178272712,
1805
+ "eval_char_accuracy": 63.853018588583645,
1806
+ "eval_loss": 0.08871379494667053,
1807
+ "eval_runtime": 316.1038,
1808
+ "eval_samples_per_second": 4.79,
1809
+ "eval_steps_per_second": 0.601,
1810
+ "step": 20000
1811
+ },
1812
+ {
1813
+ "epoch": 26.552179656538968,
1814
+ "grad_norm": 1.3396387100219727,
1815
+ "learning_rate": 8.473140585995004e-06,
1816
+ "loss": 0.0726,
1817
+ "step": 20100
1818
+ },
1819
+ {
1820
+ "epoch": 26.68428005284016,
1821
+ "grad_norm": 1.1219794750213623,
1822
+ "learning_rate": 8.457982978451683e-06,
1823
+ "loss": 0.0754,
1824
+ "step": 20200
1825
+ },
1826
+ {
1827
+ "epoch": 26.81638044914135,
1828
+ "grad_norm": 1.0815324783325195,
1829
+ "learning_rate": 8.442764209038717e-06,
1830
+ "loss": 0.0745,
1831
+ "step": 20300
1832
+ },
1833
+ {
1834
+ "epoch": 26.948480845442536,
1835
+ "grad_norm": 1.4396206140518188,
1836
+ "learning_rate": 8.427484546932789e-06,
1837
+ "loss": 0.0749,
1838
+ "step": 20400
1839
+ },
1840
+ {
1841
+ "epoch": 27.080581241743726,
1842
+ "grad_norm": 1.2644987106323242,
1843
+ "learning_rate": 8.4121442623876e-06,
1844
+ "loss": 0.0731,
1845
+ "step": 20500
1846
+ },
1847
+ {
1848
+ "epoch": 27.080581241743726,
1849
+ "eval_bleu": 71.61514434619416,
1850
+ "eval_char_accuracy": 63.06290097055437,
1851
+ "eval_loss": 0.08903466165065765,
1852
+ "eval_runtime": 316.4287,
1853
+ "eval_samples_per_second": 4.785,
1854
+ "eval_steps_per_second": 0.6,
1855
+ "step": 20500
1856
+ },
1857
+ {
1858
+ "epoch": 27.212681638044913,
1859
+ "grad_norm": 1.3456913232803345,
1860
+ "learning_rate": 8.396743626729093e-06,
1861
+ "loss": 0.0728,
1862
+ "step": 20600
1863
+ },
1864
+ {
1865
+ "epoch": 27.344782034346103,
1866
+ "grad_norm": 1.1268248558044434,
1867
+ "learning_rate": 8.381282912350646e-06,
1868
+ "loss": 0.072,
1869
+ "step": 20700
1870
+ },
1871
+ {
1872
+ "epoch": 27.476882430647294,
1873
+ "grad_norm": 0.964856743812561,
1874
+ "learning_rate": 8.365762392708259e-06,
1875
+ "loss": 0.0711,
1876
+ "step": 20800
1877
+ },
1878
+ {
1879
+ "epoch": 27.60898282694848,
1880
+ "grad_norm": 1.2877197265625,
1881
+ "learning_rate": 8.350182342315719e-06,
1882
+ "loss": 0.0681,
1883
+ "step": 20900
1884
+ },
1885
+ {
1886
+ "epoch": 27.74108322324967,
1887
+ "grad_norm": 3.0796854496002197,
1888
+ "learning_rate": 8.334543036739743e-06,
1889
+ "loss": 0.0681,
1890
+ "step": 21000
1891
+ },
1892
+ {
1893
+ "epoch": 27.74108322324967,
1894
+ "eval_bleu": 72.44362762449254,
1895
+ "eval_char_accuracy": 64.56139990129955,
1896
+ "eval_loss": 0.08809462934732437,
1897
+ "eval_runtime": 319.538,
1898
+ "eval_samples_per_second": 4.738,
1899
+ "eval_steps_per_second": 0.595,
1900
+ "step": 21000
1901
+ },
1902
+ {
1903
+ "epoch": 27.873183619550858,
1904
+ "grad_norm": 1.445993185043335,
1905
+ "learning_rate": 8.3188447525951e-06,
1906
+ "loss": 0.0701,
1907
+ "step": 21100
1908
+ },
1909
+ {
1910
+ "epoch": 28.005284015852048,
1911
+ "grad_norm": 0.9414767622947693,
1912
+ "learning_rate": 8.303087767539723e-06,
1913
+ "loss": 0.0698,
1914
+ "step": 21200
1915
+ },
1916
+ {
1917
+ "epoch": 28.137384412153235,
1918
+ "grad_norm": 1.0644490718841553,
1919
+ "learning_rate": 8.28758923914531e-06,
1920
+ "loss": 0.0674,
1921
+ "step": 21300
1922
+ },
1923
+ {
1924
+ "epoch": 28.269484808454425,
1925
+ "grad_norm": 1.2708711624145508,
1926
+ "learning_rate": 8.27171684949204e-06,
1927
+ "loss": 0.0689,
1928
+ "step": 21400
1929
+ },
1930
+ {
1931
+ "epoch": 28.401585204755616,
1932
+ "grad_norm": 1.2177033424377441,
1933
+ "learning_rate": 8.25578659248641e-06,
1934
+ "loss": 0.0677,
1935
+ "step": 21500
1936
+ },
1937
+ {
1938
+ "epoch": 28.401585204755616,
1939
+ "eval_bleu": 72.7907604984095,
1940
+ "eval_char_accuracy": 64.78141964138838,
1941
+ "eval_loss": 0.0869230180978775,
1942
+ "eval_runtime": 315.3901,
1943
+ "eval_samples_per_second": 4.8,
1944
+ "eval_steps_per_second": 0.602,
1945
+ "step": 21500
1946
+ },
1947
+ {
1948
+ "epoch": 28.533685601056803,
1949
+ "grad_norm": 1.1179392337799072,
1950
+ "learning_rate": 8.239798749889293e-06,
1951
+ "loss": 0.0673,
1952
+ "step": 21600
1953
+ },
1954
+ {
1955
+ "epoch": 28.665785997357993,
1956
+ "grad_norm": 1.2472251653671265,
1957
+ "learning_rate": 8.223753604480086e-06,
1958
+ "loss": 0.0682,
1959
+ "step": 21700
1960
+ },
1961
+ {
1962
+ "epoch": 28.79788639365918,
1963
+ "grad_norm": 0.8336161375045776,
1964
+ "learning_rate": 8.207651440051714e-06,
1965
+ "loss": 0.0689,
1966
+ "step": 21800
1967
+ },
1968
+ {
1969
+ "epoch": 28.92998678996037,
1970
+ "grad_norm": 1.2652006149291992,
1971
+ "learning_rate": 8.1914925414056e-06,
1972
+ "loss": 0.0688,
1973
+ "step": 21900
1974
+ },
1975
+ {
1976
+ "epoch": 29.062087186261557,
1977
+ "grad_norm": 1.1424204111099243,
1978
+ "learning_rate": 8.175277194346636e-06,
1979
+ "loss": 0.0677,
1980
+ "step": 22000
1981
+ },
1982
+ {
1983
+ "epoch": 29.062087186261557,
1984
+ "eval_bleu": 72.84923297683062,
1985
+ "eval_char_accuracy": 64.63285491034709,
1986
+ "eval_loss": 0.08696427941322327,
1987
+ "eval_runtime": 316.7435,
1988
+ "eval_samples_per_second": 4.78,
1989
+ "eval_steps_per_second": 0.6,
1990
+ "step": 22000
1991
+ },
1992
+ {
1993
+ "epoch": 29.194187582562748,
1994
+ "grad_norm": 1.617885708808899,
1995
+ "learning_rate": 8.159005685678126e-06,
1996
+ "loss": 0.0638,
1997
+ "step": 22100
1998
+ },
1999
+ {
2000
+ "epoch": 29.326287978863938,
2001
+ "grad_norm": 1.2440978288650513,
2002
+ "learning_rate": 8.142678303196715e-06,
2003
+ "loss": 0.0606,
2004
+ "step": 22200
2005
+ },
2006
+ {
2007
+ "epoch": 29.458388375165125,
2008
+ "grad_norm": 1.1825751066207886,
2009
+ "learning_rate": 8.12629533568729e-06,
2010
+ "loss": 0.0661,
2011
+ "step": 22300
2012
+ },
2013
+ {
2014
+ "epoch": 29.590488771466315,
2015
+ "grad_norm": 1.5328364372253418,
2016
+ "learning_rate": 8.109857072917887e-06,
2017
+ "loss": 0.0647,
2018
+ "step": 22400
2019
+ },
2020
+ {
2021
+ "epoch": 29.722589167767502,
2022
+ "grad_norm": 1.1308438777923584,
2023
+ "learning_rate": 8.093363805634556e-06,
2024
+ "loss": 0.0666,
2025
+ "step": 22500
2026
+ },
2027
+ {
2028
+ "epoch": 29.722589167767502,
2029
+ "eval_bleu": 73.15661772633777,
2030
+ "eval_char_accuracy": 63.81086527389373,
2031
+ "eval_loss": 0.08648520708084106,
2032
+ "eval_runtime": 315.6708,
2033
+ "eval_samples_per_second": 4.796,
2034
+ "eval_steps_per_second": 0.602,
2035
+ "step": 22500
2036
+ },
2037
+ {
2038
+ "epoch": 29.854689564068693,
2039
+ "grad_norm": 1.0442135334014893,
2040
+ "learning_rate": 8.076815825556213e-06,
2041
+ "loss": 0.0648,
2042
+ "step": 22600
2043
+ },
2044
+ {
2045
+ "epoch": 29.98678996036988,
2046
+ "grad_norm": 1.363897681236267,
2047
+ "learning_rate": 8.060213425369492e-06,
2048
+ "loss": 0.0654,
2049
+ "step": 22700
2050
+ },
2051
+ {
2052
+ "epoch": 30.11889035667107,
2053
+ "grad_norm": 0.7751464247703552,
2054
+ "learning_rate": 8.043556898723568e-06,
2055
+ "loss": 0.0628,
2056
+ "step": 22800
2057
+ },
2058
+ {
2059
+ "epoch": 30.25099075297226,
2060
+ "grad_norm": 0.8685150742530823,
2061
+ "learning_rate": 8.026846540224956e-06,
2062
+ "loss": 0.0584,
2063
+ "step": 22900
2064
+ },
2065
+ {
2066
+ "epoch": 30.383091149273447,
2067
+ "grad_norm": 1.1484973430633545,
2068
+ "learning_rate": 8.0100826454323e-06,
2069
+ "loss": 0.0604,
2070
+ "step": 23000
2071
+ },
2072
+ {
2073
+ "epoch": 30.383091149273447,
2074
+ "eval_bleu": 73.25107285034876,
2075
+ "eval_char_accuracy": 65.01943164994243,
2076
+ "eval_loss": 0.08666232973337173,
2077
+ "eval_runtime": 323.7755,
2078
+ "eval_samples_per_second": 4.676,
2079
+ "eval_steps_per_second": 0.587,
2080
+ "step": 23000
2081
+ },
2082
+ {
2083
+ "epoch": 30.515191545574638,
2084
+ "grad_norm": 0.986289918422699,
2085
+ "learning_rate": 7.993265510851148e-06,
2086
+ "loss": 0.0688,
2087
+ "step": 23100
2088
+ },
2089
+ {
2090
+ "epoch": 30.647291941875825,
2091
+ "grad_norm": 1.0403423309326172,
2092
+ "learning_rate": 7.97639543392872e-06,
2093
+ "loss": 0.0638,
2094
+ "step": 23200
2095
+ },
2096
+ {
2097
+ "epoch": 30.779392338177015,
2098
+ "grad_norm": 1.517040729522705,
2099
+ "learning_rate": 7.959472713048617e-06,
2100
+ "loss": 0.0653,
2101
+ "step": 23300
2102
+ },
2103
+ {
2104
+ "epoch": 30.911492734478202,
2105
+ "grad_norm": 1.1347965002059937,
2106
+ "learning_rate": 7.942497647525576e-06,
2107
+ "loss": 0.0642,
2108
+ "step": 23400
2109
+ },
2110
+ {
2111
+ "epoch": 31.043593130779392,
2112
+ "grad_norm": 1.0789778232574463,
2113
+ "learning_rate": 7.925470537600155e-06,
2114
+ "loss": 0.0614,
2115
+ "step": 23500
2116
+ },
2117
+ {
2118
+ "epoch": 31.043593130779392,
2119
+ "eval_bleu": 73.23277401857816,
2120
+ "eval_char_accuracy": 65.0646693535121,
2121
+ "eval_loss": 0.08618722856044769,
2122
+ "eval_runtime": 313.4951,
2123
+ "eval_samples_per_second": 4.829,
2124
+ "eval_steps_per_second": 0.606,
2125
+ "step": 23500
2126
+ },
2127
+ {
2128
+ "epoch": 31.175693527080583,
2129
+ "grad_norm": 1.4853187799453735,
2130
+ "learning_rate": 7.908391684433432e-06,
2131
+ "loss": 0.0585,
2132
+ "step": 23600
2133
+ },
2134
+ {
2135
+ "epoch": 31.30779392338177,
2136
+ "grad_norm": 0.9656835794448853,
2137
+ "learning_rate": 7.891261390101675e-06,
2138
+ "loss": 0.0578,
2139
+ "step": 23700
2140
+ },
2141
+ {
2142
+ "epoch": 31.43989431968296,
2143
+ "grad_norm": 1.1521549224853516,
2144
+ "learning_rate": 7.874079957590997e-06,
2145
+ "loss": 0.0622,
2146
+ "step": 23800
2147
+ },
2148
+ {
2149
+ "epoch": 31.571994715984147,
2150
+ "grad_norm": 1.0636780261993408,
2151
+ "learning_rate": 7.856847690792002e-06,
2152
+ "loss": 0.0604,
2153
+ "step": 23900
2154
+ },
2155
+ {
2156
+ "epoch": 31.704095112285337,
2157
+ "grad_norm": 1.0833789110183716,
2158
+ "learning_rate": 7.839564894494409e-06,
2159
+ "loss": 0.0633,
2160
+ "step": 24000
2161
+ },
2162
+ {
2163
+ "epoch": 31.704095112285337,
2164
+ "eval_bleu": 73.62536575895216,
2165
+ "eval_char_accuracy": 65.15565882546471,
2166
+ "eval_loss": 0.08546082675457001,
2167
+ "eval_runtime": 318.0064,
2168
+ "eval_samples_per_second": 4.761,
2169
+ "eval_steps_per_second": 0.597,
2170
+ "step": 24000
2171
+ },
2172
+ {
2173
+ "epoch": 31.836195508586528,
2174
+ "grad_norm": 1.1620845794677734,
2175
+ "learning_rate": 7.822231874381658e-06,
2176
+ "loss": 0.0604,
2177
+ "step": 24100
2178
+ },
2179
+ {
2180
+ "epoch": 31.968295904887714,
2181
+ "grad_norm": 1.315012812614441,
2182
+ "learning_rate": 7.804848937025507e-06,
2183
+ "loss": 0.0593,
2184
+ "step": 24200
2185
+ },
2186
+ {
2187
+ "epoch": 32.1003963011889,
2188
+ "grad_norm": 0.8739562034606934,
2189
+ "learning_rate": 7.787416389880605e-06,
2190
+ "loss": 0.0608,
2191
+ "step": 24300
2192
+ },
2193
+ {
2194
+ "epoch": 32.23249669749009,
2195
+ "grad_norm": 0.9168538451194763,
2196
+ "learning_rate": 7.769934541279059e-06,
2197
+ "loss": 0.0577,
2198
+ "step": 24400
2199
+ },
2200
+ {
2201
+ "epoch": 32.36459709379128,
2202
+ "grad_norm": 0.9820032715797424,
2203
+ "learning_rate": 7.752403700424978e-06,
2204
+ "loss": 0.0569,
2205
+ "step": 24500
2206
+ },
2207
+ {
2208
+ "epoch": 32.36459709379128,
2209
+ "eval_bleu": 73.72149088930817,
2210
+ "eval_char_accuracy": 65.55457312057904,
2211
+ "eval_loss": 0.08627723157405853,
2212
+ "eval_runtime": 314.5801,
2213
+ "eval_samples_per_second": 4.813,
2214
+ "eval_steps_per_second": 0.604,
2215
+ "step": 24500
2216
+ },
2217
+ {
2218
+ "epoch": 32.49669749009247,
2219
+ "grad_norm": 1.0042686462402344,
2220
+ "learning_rate": 7.734824177389006e-06,
2221
+ "loss": 0.0582,
2222
+ "step": 24600
2223
+ },
2224
+ {
2225
+ "epoch": 32.628797886393656,
2226
+ "grad_norm": 1.251654863357544,
2227
+ "learning_rate": 7.71719628310283e-06,
2228
+ "loss": 0.0589,
2229
+ "step": 24700
2230
+ },
2231
+ {
2232
+ "epoch": 32.760898282694846,
2233
+ "grad_norm": 1.3150684833526611,
2234
+ "learning_rate": 7.699520329353694e-06,
2235
+ "loss": 0.0585,
2236
+ "step": 24800
2237
+ },
2238
+ {
2239
+ "epoch": 32.89299867899604,
2240
+ "grad_norm": 1.318556547164917,
2241
+ "learning_rate": 7.681796628778876e-06,
2242
+ "loss": 0.0588,
2243
+ "step": 24900
2244
+ },
2245
+ {
2246
+ "epoch": 33.02509907529723,
2247
+ "grad_norm": 1.2693874835968018,
2248
+ "learning_rate": 7.664025494860155e-06,
2249
+ "loss": 0.0605,
2250
+ "step": 25000
2251
+ },
2252
+ {
2253
+ "epoch": 33.02509907529723,
2254
+ "eval_bleu": 73.95631775899457,
2255
+ "eval_char_accuracy": 65.17056670505016,
2256
+ "eval_loss": 0.08447689563035965,
2257
+ "eval_runtime": 317.601,
2258
+ "eval_samples_per_second": 4.767,
2259
+ "eval_steps_per_second": 0.598,
2260
+ "step": 25000
2261
+ },
2262
+ {
2263
+ "epoch": 33.15719947159842,
2264
+ "grad_norm": 0.7866926193237305,
2265
+ "learning_rate": 7.646207241918272e-06,
2266
+ "loss": 0.055,
2267
+ "step": 25100
2268
+ },
2269
+ {
2270
+ "epoch": 33.2892998678996,
2271
+ "grad_norm": 1.0340533256530762,
2272
+ "learning_rate": 7.628342185107373e-06,
2273
+ "loss": 0.0563,
2274
+ "step": 25200
2275
+ },
2276
+ {
2277
+ "epoch": 33.42140026420079,
2278
+ "grad_norm": 1.6704190969467163,
2279
+ "learning_rate": 7.610430640409427e-06,
2280
+ "loss": 0.0568,
2281
+ "step": 25300
2282
+ },
2283
+ {
2284
+ "epoch": 33.55350066050198,
2285
+ "grad_norm": 1.4271676540374756,
2286
+ "learning_rate": 7.592472924628642e-06,
2287
+ "loss": 0.056,
2288
+ "step": 25400
2289
+ },
2290
+ {
2291
+ "epoch": 33.68560105680317,
2292
+ "grad_norm": 1.3886315822601318,
2293
+ "learning_rate": 7.574469355385865e-06,
2294
+ "loss": 0.0552,
2295
+ "step": 25500
2296
+ },
2297
+ {
2298
+ "epoch": 33.68560105680317,
2299
+ "eval_bleu": 73.67062952266826,
2300
+ "eval_char_accuracy": 65.04410676098043,
2301
+ "eval_loss": 0.08498267084360123,
2302
+ "eval_runtime": 314.7186,
2303
+ "eval_samples_per_second": 4.811,
2304
+ "eval_steps_per_second": 0.604,
2305
+ "step": 25500
2306
+ },
2307
+ {
2308
+ "epoch": 33.81770145310436,
2309
+ "grad_norm": 1.1037873029708862,
2310
+ "learning_rate": 7.556420251112956e-06,
2311
+ "loss": 0.0551,
2312
+ "step": 25600
2313
+ },
2314
+ {
2315
+ "epoch": 33.949801849405546,
2316
+ "grad_norm": 2.125624418258667,
2317
+ "learning_rate": 7.538325931047159e-06,
2318
+ "loss": 0.0591,
2319
+ "step": 25700
2320
+ },
2321
+ {
2322
+ "epoch": 34.081902245706736,
2323
+ "grad_norm": 1.674501895904541,
2324
+ "learning_rate": 7.52018671522546e-06,
2325
+ "loss": 0.0561,
2326
+ "step": 25800
2327
+ },
2328
+ {
2329
+ "epoch": 34.21400264200793,
2330
+ "grad_norm": 1.386206030845642,
2331
+ "learning_rate": 7.502002924478924e-06,
2332
+ "loss": 0.0509,
2333
+ "step": 25900
2334
+ },
2335
+ {
2336
+ "epoch": 34.34610303830912,
2337
+ "grad_norm": 1.0778214931488037,
2338
+ "learning_rate": 7.48377488042701e-06,
2339
+ "loss": 0.0544,
2340
+ "step": 26000
2341
+ },
2342
+ {
2343
+ "epoch": 34.34610303830912,
2344
+ "eval_bleu": 74.62046528623556,
2345
+ "eval_char_accuracy": 66.02442835992763,
2346
+ "eval_loss": 0.08464069664478302,
2347
+ "eval_runtime": 316.1374,
2348
+ "eval_samples_per_second": 4.789,
2349
+ "eval_steps_per_second": 0.601,
2350
+ "step": 26000
2351
+ },
2352
+ {
2353
+ "epoch": 34.4782034346103,
2354
+ "grad_norm": 0.8076276779174805,
2355
+ "learning_rate": 7.465502905471907e-06,
2356
+ "loss": 0.055,
2357
+ "step": 26100
2358
+ },
2359
+ {
2360
+ "epoch": 34.61030383091149,
2361
+ "grad_norm": 1.1508395671844482,
2362
+ "learning_rate": 7.447187322792806e-06,
2363
+ "loss": 0.057,
2364
+ "step": 26200
2365
+ },
2366
+ {
2367
+ "epoch": 34.74240422721268,
2368
+ "grad_norm": 1.2695698738098145,
2369
+ "learning_rate": 7.4288284563401945e-06,
2370
+ "loss": 0.055,
2371
+ "step": 26300
2372
+ },
2373
+ {
2374
+ "epoch": 34.87450462351387,
2375
+ "grad_norm": 1.166051983833313,
2376
+ "learning_rate": 7.410426630830131e-06,
2377
+ "loss": 0.0552,
2378
+ "step": 26400
2379
+ },
2380
+ {
2381
+ "epoch": 35.00660501981506,
2382
+ "grad_norm": 0.9517413973808289,
2383
+ "learning_rate": 7.391982171738496e-06,
2384
+ "loss": 0.0555,
2385
+ "step": 26500
2386
+ },
2387
+ {
2388
+ "epoch": 35.00660501981506,
2389
+ "eval_bleu": 74.18793581426324,
2390
+ "eval_char_accuracy": 65.63373910182597,
2391
+ "eval_loss": 0.08454510569572449,
2392
+ "eval_runtime": 313.1895,
2393
+ "eval_samples_per_second": 4.834,
2394
+ "eval_steps_per_second": 0.607,
2395
+ "step": 26500
2396
+ },
2397
+ {
2398
+ "epoch": 35.138705416116245,
2399
+ "grad_norm": 1.1265789270401,
2400
+ "learning_rate": 7.373495405295236e-06,
2401
+ "loss": 0.0529,
2402
+ "step": 26600
2403
+ },
2404
+ {
2405
+ "epoch": 35.270805812417436,
2406
+ "grad_norm": 1.0067466497421265,
2407
+ "learning_rate": 7.354966658478594e-06,
2408
+ "loss": 0.0502,
2409
+ "step": 26700
2410
+ },
2411
+ {
2412
+ "epoch": 35.402906208718626,
2413
+ "grad_norm": 0.9871610999107361,
2414
+ "learning_rate": 7.336396259009325e-06,
2415
+ "loss": 0.0508,
2416
+ "step": 26800
2417
+ },
2418
+ {
2419
+ "epoch": 35.53500660501982,
2420
+ "grad_norm": 1.4898390769958496,
2421
+ "learning_rate": 7.317784535344905e-06,
2422
+ "loss": 0.0544,
2423
+ "step": 26900
2424
+ },
2425
+ {
2426
+ "epoch": 35.66710700132101,
2427
+ "grad_norm": 1.1168763637542725,
2428
+ "learning_rate": 7.2991318166737126e-06,
2429
+ "loss": 0.0535,
2430
+ "step": 27000
2431
+ },
2432
+ {
2433
+ "epoch": 35.66710700132101,
2434
+ "eval_bleu": 74.4767569683976,
2435
+ "eval_char_accuracy": 65.44353512090805,
2436
+ "eval_loss": 0.08464961498975754,
2437
+ "eval_runtime": 313.4254,
2438
+ "eval_samples_per_second": 4.83,
2439
+ "eval_steps_per_second": 0.606,
2440
+ "step": 27000
2441
+ },
2442
+ {
2443
+ "epoch": 35.79920739762219,
2444
+ "grad_norm": 1.197704553604126,
2445
+ "learning_rate": 7.280625566954032e-06,
2446
+ "loss": 0.0547,
2447
+ "step": 27100
2448
+ },
2449
+ {
2450
+ "epoch": 35.93130779392338,
2451
+ "grad_norm": 1.6144860982894897,
2452
+ "learning_rate": 7.261892250434568e-06,
2453
+ "loss": 0.0516,
2454
+ "step": 27200
2455
+ },
2456
+ {
2457
+ "epoch": 36.06340819022457,
2458
+ "grad_norm": 1.1599304676055908,
2459
+ "learning_rate": 7.243118927483657e-06,
2460
+ "loss": 0.0502,
2461
+ "step": 27300
2462
+ },
2463
+ {
2464
+ "epoch": 36.19550858652576,
2465
+ "grad_norm": 1.043449878692627,
2466
+ "learning_rate": 7.22430593014791e-06,
2467
+ "loss": 0.0472,
2468
+ "step": 27400
2469
+ },
2470
+ {
2471
+ "epoch": 36.32760898282695,
2472
+ "grad_norm": 1.1489434242248535,
2473
+ "learning_rate": 7.205453591175666e-06,
2474
+ "loss": 0.0558,
2475
+ "step": 27500
2476
+ },
2477
+ {
2478
+ "epoch": 36.32760898282695,
2479
+ "eval_bleu": 74.20946250489693,
2480
+ "eval_char_accuracy": 65.89436996216483,
2481
+ "eval_loss": 0.08468983322381973,
2482
+ "eval_runtime": 314.7938,
2483
+ "eval_samples_per_second": 4.809,
2484
+ "eval_steps_per_second": 0.604,
2485
+ "step": 27500
2486
+ },
2487
+ {
2488
+ "epoch": 36.32760898282695,
2489
+ "step": 27500,
2490
+ "total_flos": 7035660725649408.0,
2491
+ "train_loss": 0.282735899699818,
2492
+ "train_runtime": 21727.9767,
2493
+ "train_samples_per_second": 55.735,
2494
+ "train_steps_per_second": 3.484
2495
+ }
2496
+ ],
2497
+ "logging_steps": 100,
2498
+ "max_steps": 75700,
2499
+ "num_input_tokens_seen": 0,
2500
+ "num_train_epochs": 100,
2501
+ "save_steps": 500,
2502
+ "stateful_callbacks": {
2503
+ "EarlyStoppingCallback": {
2504
+ "args": {
2505
+ "early_stopping_patience": 3,
2506
+ "early_stopping_threshold": 0.0
2507
+ },
2508
+ "attributes": {
2509
+ "early_stopping_patience_counter": 3
2510
+ }
2511
+ },
2512
+ "TrainerControl": {
2513
+ "args": {
2514
+ "should_epoch_stop": false,
2515
+ "should_evaluate": false,
2516
+ "should_log": false,
2517
+ "should_save": true,
2518
+ "should_training_stop": true
2519
+ },
2520
+ "attributes": {}
2521
+ }
2522
+ },
2523
+ "total_flos": 7035660725649408.0,
2524
+ "train_batch_size": 8,
2525
+ "trial_name": null,
2526
+ "trial_params": null
2527
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5952b6a5e7d8317da73453941ab91679dc14918147c96729a68d3d232bb1ebd7
3
+ size 5841
vocab.json ADDED
The diff for this file is too large to render. See raw diff