yuriivoievidka commited on
Commit
37e05dd
·
verified ·
1 Parent(s): cbba7df

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -4,46 +4,45 @@ tags:
4
  - sentence-similarity
5
  - feature-extraction
6
  - generated_from_trainer
7
- - dataset_size:10668
8
  - loss:MultipleNegativesSymmetricRankingLoss
9
  base_model: microsoft/mpnet-base
10
  widget:
11
- - source_sentence: Best Job Ever! Rethink Your Career, Redefine Rich, Revolutionize
12
- Your Life by Dr. CK Bray
13
  sentences:
14
- - Books on Sales
 
 
 
 
 
 
 
15
  - Books on Self-Help for Women
16
- - Books on the Cold War
17
- - source_sentence: 'Empire of Pain: The Secret History of the Sackler Dynasty by Patrick
18
- Radden Keefe'
19
  sentences:
 
20
  - Books on Personal Development
21
- - Books on Wealth
22
- - Books on Communication
23
- - source_sentence: Seven Kinds of People You Find in Bookshops by Shaun Bythell
24
- sentences:
25
- - Books on Self-Help
26
- - Books on Social Skills
27
- - Books on Emotional Labor
28
- - source_sentence: 'The Law of Attraction: How to Attract Money, Love, and Happiness
29
- by David R. Hooper'
30
  sentences:
31
- - Books on How to Attract Money
32
- - Books on Mental Health
33
- - Books on Civil Rights
34
- - source_sentence: 'Hyperfocus: How to Manage Your Attention in a World of Distraction
35
- by Chris Bailey'
36
  sentences:
37
- - Books on Career Development
38
- - Books on Astronomy
39
- - Books on Self-Care
40
  pipeline_tag: sentence-similarity
41
  library_name: sentence-transformers
42
  ---
43
 
44
  # SentenceTransformer based on microsoft/mpnet-base
45
 
46
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
47
 
48
  ## Model Details
49
 
@@ -54,7 +53,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [m
54
  - **Output Dimensionality:** 768 dimensions
55
  - **Similarity Function:** Cosine Similarity
56
  - **Training Dataset:**
57
- - csv
58
  <!-- - **Language:** Unknown -->
59
  <!-- - **License:** Unknown -->
60
 
@@ -88,12 +87,12 @@ Then you can load this model and run inference.
88
  from sentence_transformers import SentenceTransformer
89
 
90
  # Download from the 🤗 Hub
91
- model = SentenceTransformer("yuriivoievidka/microsoft_mpnet-base-librarian")
92
  # Run inference
93
  sentences = [
94
- 'Hyperfocus: How to Manage Your Attention in a World of Distraction by Chris Bailey',
95
- 'Books on Self-Care',
96
- 'Books on Career Development',
97
  ]
98
  embeddings = model.encode(sentences)
99
  print(embeddings.shape)
@@ -145,22 +144,22 @@ You can finetune this model on your own dataset.
145
 
146
  ### Training Dataset
147
 
148
- #### csv
149
 
150
- * Dataset: csv
151
- * Size: 10,668 training samples
152
  * Columns: <code>anchor</code> and <code>positive</code>
153
  * Approximate statistics based on the first 1000 samples:
154
- | | anchor | positive |
155
- |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
156
- | type | string | string |
157
- | details | <ul><li>min: 6 tokens</li><li>mean: 22.04 tokens</li><li>max: 60 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 5.85 tokens</li><li>max: 10 tokens</li></ul> |
158
  * Samples:
159
- | anchor | positive |
160
- |:--------------------------------------------------------------------------------------------------------------------|:--------------------------------|
161
- | <code>Getting to Yes: Negotiating Agreement Without Giving In by Roger Fisher, William Ury, and Bruce Patton</code> | <code>Books on Success</code> |
162
- | <code>Whistling Vivaldi: How Stereotypes Affect Us and What We Can Do by Claude M. Steele</code> | <code>Books on Diversity</code> |
163
- | <code>Blindspot: Hidden Biases of Good People by Mahzarin R. Banaji and Anthony G. Greenwald</code> | <code>Books on Mindset</code> |
164
  * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
165
  ```json
166
  {
@@ -171,22 +170,22 @@ You can finetune this model on your own dataset.
171
 
172
  ### Evaluation Dataset
173
 
174
- #### csv
175
 
176
- * Dataset: csv
177
- * Size: 5,333 evaluation samples
178
  * Columns: <code>anchor</code> and <code>positive</code>
179
  * Approximate statistics based on the first 1000 samples:
180
- | | anchor | positive |
181
- |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
182
- | type | string | string |
183
- | details | <ul><li>min: 6 tokens</li><li>mean: 22.26 tokens</li><li>max: 60 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 5.83 tokens</li><li>max: 10 tokens</li></ul> |
184
  * Samples:
185
- | anchor | positive |
186
- |:-------------------------------------------------------------------------------------------------------------------|:------------------------------------------|
187
- | <code>Will It Fly?: How to Test Your Next Business Idea So You Don’t Waste Your Time and Money by Pat Flynn</code> | <code>Books on Advertising</code> |
188
- | <code>The Art of Stillness: Adventures in Going Nowhere by Pico Iyer</code> | <code>Books on Spiritual Awakening</code> |
189
- | <code>Just As I Am: A Memoir by Cicely Tyson, Michelle Burford</code> | <code>Books about Misinformation</code> |
190
  * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
191
  ```json
192
  {
@@ -198,11 +197,11 @@ You can finetune this model on your own dataset.
198
  ### Training Hyperparameters
199
  #### Non-Default Hyperparameters
200
 
201
- - `eval_strategy`: epoch
202
  - `per_device_train_batch_size`: 16
203
  - `per_device_eval_batch_size`: 16
204
  - `learning_rate`: 2e-05
205
- - `num_train_epochs`: 8
206
  - `warmup_ratio`: 0.1
207
 
208
  #### All Hyperparameters
@@ -210,7 +209,7 @@ You can finetune this model on your own dataset.
210
 
211
  - `overwrite_output_dir`: False
212
  - `do_predict`: False
213
- - `eval_strategy`: epoch
214
  - `prediction_loss_only`: True
215
  - `per_device_train_batch_size`: 16
216
  - `per_device_eval_batch_size`: 16
@@ -225,7 +224,7 @@ You can finetune this model on your own dataset.
225
  - `adam_beta2`: 0.999
226
  - `adam_epsilon`: 1e-08
227
  - `max_grad_norm`: 1.0
228
- - `num_train_epochs`: 8
229
  - `max_steps`: -1
230
  - `lr_scheduler_type`: linear
231
  - `lr_scheduler_kwargs`: {}
@@ -325,76 +324,49 @@ You can finetune this model on your own dataset.
325
  </details>
326
 
327
  ### Training Logs
328
- | Epoch | Step | Training Loss | Validation Loss |
329
- |:------:|:----:|:-------------:|:---------------:|
330
- | 0.1499 | 100 | 3.0137 | - |
331
- | 0.2999 | 200 | 2.3781 | - |
332
- | 0.4498 | 300 | 2.1067 | - |
333
- | 0.5997 | 400 | 2.0142 | - |
334
- | 0.7496 | 500 | 1.9861 | - |
335
- | 0.8996 | 600 | 1.8463 | - |
336
- | 1.0 | 667 | - | 1.7604 |
337
- | 1.0495 | 700 | 1.8115 | - |
338
- | 1.1994 | 800 | 1.7093 | - |
339
- | 1.3493 | 900 | 1.6853 | - |
340
- | 1.4993 | 1000 | 1.702 | - |
341
- | 1.6492 | 1100 | 1.6664 | - |
342
- | 1.7991 | 1200 | 1.6824 | - |
343
- | 1.9490 | 1300 | 1.6174 | - |
344
- | 2.0 | 1334 | - | 1.6624 |
345
- | 2.0990 | 1400 | 1.5585 | - |
346
- | 2.2489 | 1500 | 1.5112 | - |
347
- | 2.3988 | 1600 | 1.5384 | - |
348
- | 2.5487 | 1700 | 1.5013 | - |
349
- | 2.6987 | 1800 | 1.4589 | - |
350
- | 2.8486 | 1900 | 1.5108 | - |
351
- | 2.9985 | 2000 | 1.5287 | - |
352
- | 3.0 | 2001 | - | 1.6140 |
353
- | 3.1484 | 2100 | 1.3973 | - |
354
- | 3.2984 | 2200 | 1.3658 | - |
355
- | 3.4483 | 2300 | 1.4294 | - |
356
- | 3.5982 | 2400 | 1.3957 | - |
357
- | 3.7481 | 2500 | 1.3888 | - |
358
- | 3.8981 | 2600 | 1.4405 | - |
359
- | 4.0 | 2668 | - | 1.6155 |
360
- | 4.0480 | 2700 | 1.3603 | - |
361
- | 4.1979 | 2800 | 1.2872 | - |
362
- | 4.3478 | 2900 | 1.2514 | - |
363
- | 4.4978 | 3000 | 1.3011 | - |
364
- | 4.6477 | 3100 | 1.3175 | - |
365
- | 4.7976 | 3200 | 1.3553 | - |
366
- | 4.9475 | 3300 | 1.3157 | - |
367
- | 5.0 | 3335 | - | 1.6061 |
368
- | 5.0975 | 3400 | 1.2754 | - |
369
- | 5.2474 | 3500 | 1.2315 | - |
370
- | 5.3973 | 3600 | 1.2454 | - |
371
- | 5.5472 | 3700 | 1.2441 | - |
372
- | 5.6972 | 3800 | 1.266 | - |
373
- | 5.8471 | 3900 | 1.2304 | - |
374
- | 5.9970 | 4000 | 1.2717 | - |
375
- | 6.0 | 4002 | - | 1.6100 |
376
- | 6.1469 | 4100 | 1.1706 | - |
377
- | 6.2969 | 4200 | 1.2203 | - |
378
- | 6.4468 | 4300 | 1.1441 | - |
379
- | 6.5967 | 4400 | 1.1895 | - |
380
- | 6.7466 | 4500 | 1.176 | - |
381
- | 6.8966 | 4600 | 1.1903 | - |
382
- | 7.0 | 4669 | - | 1.6341 |
383
- | 7.0465 | 4700 | 1.2028 | - |
384
- | 7.1964 | 4800 | 1.1416 | - |
385
- | 7.3463 | 4900 | 1.1405 | - |
386
- | 7.4963 | 5000 | 1.1454 | - |
387
- | 7.6462 | 5100 | 1.1217 | - |
388
- | 7.7961 | 5200 | 1.1682 | - |
389
- | 7.9460 | 5300 | 1.1582 | - |
390
 
391
 
392
  ### Framework Versions
393
  - Python: 3.10.12
394
  - Sentence Transformers: 4.1.0
395
- - Transformers: 4.53.0.dev0
396
- - PyTorch: 2.7.1+cu126
397
- - Accelerate: 1.7.0
398
  - Datasets: 3.6.0
399
  - Tokenizers: 0.21.1
400
 
 
4
  - sentence-similarity
5
  - feature-extraction
6
  - generated_from_trainer
7
+ - dataset_size:10635
8
  - loss:MultipleNegativesSymmetricRankingLoss
9
  base_model: microsoft/mpnet-base
10
  widget:
11
+ - source_sentence: '12 Rules For Life: An Antidote to Chaos by Jordan B. Peterson'
 
12
  sentences:
13
+ - Books on Investing
14
+ - Books on Resilience
15
+ - Books on Motivational
16
+ - source_sentence: 'Get the Guy: Learn Secrets of the Male Mind to Find the Man You
17
+ Want and the Love You Deserve by Matthew Hussey'
18
+ sentences:
19
+ - Books on Complexity
20
+ - Books on Decision Making
21
  - Books on Self-Help for Women
22
+ - source_sentence: The Magic of Tiny Business (You Don’t Have to Go Big to Make a
23
+ Great Living) by Sharon Rowe
 
24
  sentences:
25
+ - Books on Vegetarianism
26
  - Books on Personal Development
27
+ - Books on Emotions
28
+ - source_sentence: 'The Dorito Effect: The Surprising New Truth About Food and Flavor
29
+ by Mark Schatzker'
 
 
 
 
 
 
30
  sentences:
31
+ - Books on Skincare
32
+ - Books on Work-Life Balance
33
+ - Books on Problem Solving
34
+ - source_sentence: '12 Rules For Life: An Antidote to Chaos by Jordan B. Peterson'
 
35
  sentences:
36
+ - Books on Psychology
37
+ - Books on Positive Thinking
38
+ - Books on Investing
39
  pipeline_tag: sentence-similarity
40
  library_name: sentence-transformers
41
  ---
42
 
43
  # SentenceTransformer based on microsoft/mpnet-base
44
 
45
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) on the train dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
46
 
47
  ## Model Details
48
 
 
53
  - **Output Dimensionality:** 768 dimensions
54
  - **Similarity Function:** Cosine Similarity
55
  - **Training Dataset:**
56
+ - train
57
  <!-- - **Language:** Unknown -->
58
  <!-- - **License:** Unknown -->
59
 
 
87
  from sentence_transformers import SentenceTransformer
88
 
89
  # Download from the 🤗 Hub
90
+ model = SentenceTransformer("sentence_transformers_model_id")
91
  # Run inference
92
  sentences = [
93
+ '12 Rules For Life: An Antidote to Chaos by Jordan B. Peterson',
94
+ 'Books on Psychology',
95
+ 'Books on Positive Thinking',
96
  ]
97
  embeddings = model.encode(sentences)
98
  print(embeddings.shape)
 
144
 
145
  ### Training Dataset
146
 
147
+ #### train
148
 
149
+ * Dataset: train
150
+ * Size: 10,635 training samples
151
  * Columns: <code>anchor</code> and <code>positive</code>
152
  * Approximate statistics based on the first 1000 samples:
153
+ | | anchor | positive |
154
+ |:--------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
155
+ | type | string | string |
156
+ | details | <ul><li>min: 11 tokens</li><li>mean: 24.11 tokens</li><li>max: 60 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 5.89 tokens</li><li>max: 10 tokens</li></ul> |
157
  * Samples:
158
+ | anchor | positive |
159
+ |:-------------------------------------------------------------------------------------------------------------------|:-----------------------------------|
160
+ | <code>The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing by Marie Kondō</code> | <code>Books on Organization</code> |
161
+ | <code>The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing by Marie Kondō</code> | <code>Books on Minimalism</code> |
162
+ | <code>The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing by Marie Kondō</code> | <code>Books on Japanese Art</code> |
163
  * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
164
  ```json
165
  {
 
170
 
171
  ### Evaluation Dataset
172
 
173
+ #### train
174
 
175
+ * Dataset: train
176
+ * Size: 5,359 evaluation samples
177
  * Columns: <code>anchor</code> and <code>positive</code>
178
  * Approximate statistics based on the first 1000 samples:
179
+ | | anchor | positive |
180
+ |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
181
+ | type | string | string |
182
+ | details | <ul><li>min: 8 tokens</li><li>mean: 22.0 tokens</li><li>max: 38 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 5.85 tokens</li><li>max: 13 tokens</li></ul> |
183
  * Samples:
184
+ | anchor | positive |
185
+ |:---------------------------------------------------------------------------|:-------------------------------------------|
186
+ | <code>12 Rules For Life: An Antidote to Chaos by Jordan B. Peterson</code> | <code>Books on Psychology</code> |
187
+ | <code>12 Rules For Life: An Antidote to Chaos by Jordan B. Peterson</code> | <code>Books on Self-Help</code> |
188
+ | <code>12 Rules For Life: An Antidote to Chaos by Jordan B. Peterson</code> | <code>Books on Personal Development</code> |
189
  * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
190
  ```json
191
  {
 
197
  ### Training Hyperparameters
198
  #### Non-Default Hyperparameters
199
 
200
+ - `eval_strategy`: steps
201
  - `per_device_train_batch_size`: 16
202
  - `per_device_eval_batch_size`: 16
203
  - `learning_rate`: 2e-05
204
+ - `num_train_epochs`: 10
205
  - `warmup_ratio`: 0.1
206
 
207
  #### All Hyperparameters
 
209
 
210
  - `overwrite_output_dir`: False
211
  - `do_predict`: False
212
+ - `eval_strategy`: steps
213
  - `prediction_loss_only`: True
214
  - `per_device_train_batch_size`: 16
215
  - `per_device_eval_batch_size`: 16
 
224
  - `adam_beta2`: 0.999
225
  - `adam_epsilon`: 1e-08
226
  - `max_grad_norm`: 1.0
227
+ - `num_train_epochs`: 10
228
  - `max_steps`: -1
229
  - `lr_scheduler_type`: linear
230
  - `lr_scheduler_kwargs`: {}
 
324
  </details>
325
 
326
  ### Training Logs
327
+ | Epoch | Step | Training Loss | train loss |
328
+ |:------:|:----:|:-------------:|:----------:|
329
+ | 0.3008 | 200 | 2.8113 | 2.0799 |
330
+ | 0.6015 | 400 | 2.0877 | 1.9239 |
331
+ | 0.9023 | 600 | 1.9258 | 1.8882 |
332
+ | 1.2030 | 800 | 1.7382 | 1.8684 |
333
+ | 1.5038 | 1000 | 1.7232 | 1.8226 |
334
+ | 1.8045 | 1200 | 1.6814 | 1.8167 |
335
+ | 2.1053 | 1400 | 1.5764 | 1.8133 |
336
+ | 2.4060 | 1600 | 1.5333 | 1.7898 |
337
+ | 2.7068 | 1800 | 1.5216 | 1.7782 |
338
+ | 3.0075 | 2000 | 1.4966 | 1.7663 |
339
+ | 3.3083 | 2200 | 1.4325 | 1.7642 |
340
+ | 3.6090 | 2400 | 1.4043 | 1.7956 |
341
+ | 3.9098 | 2600 | 1.4212 | 1.7609 |
342
+ | 4.2105 | 2800 | 1.3808 | 1.7611 |
343
+ | 4.5113 | 3000 | 1.35 | 1.7671 |
344
+ | 4.8120 | 3200 | 1.3644 | 1.7517 |
345
+ | 5.1128 | 3400 | 1.304 | 1.7712 |
346
+ | 5.4135 | 3600 | 1.288 | 1.7820 |
347
+ | 5.7143 | 3800 | 1.3051 | 1.7699 |
348
+ | 6.0150 | 4000 | 1.2803 | 1.7678 |
349
+ | 6.3158 | 4200 | 1.2026 | 1.7812 |
350
+ | 6.6165 | 4400 | 1.2602 | 1.7846 |
351
+ | 6.9173 | 4600 | 1.2392 | 1.7733 |
352
+ | 7.2180 | 4800 | 1.2088 | 1.7745 |
353
+ | 7.5188 | 5000 | 1.1791 | 1.7867 |
354
+ | 7.8195 | 5200 | 1.1946 | 1.7779 |
355
+ | 8.1203 | 5400 | 1.1617 | 1.7931 |
356
+ | 8.4211 | 5600 | 1.1495 | 1.7911 |
357
+ | 8.7218 | 5800 | 1.1635 | 1.7949 |
358
+ | 9.0226 | 6000 | 1.1324 | 1.7962 |
359
+ | 9.3233 | 6200 | 1.1304 | 1.8035 |
360
+ | 9.6241 | 6400 | 1.1126 | 1.8056 |
361
+ | 9.9248 | 6600 | 1.0986 | 1.8062 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
362
 
363
 
364
  ### Framework Versions
365
  - Python: 3.10.12
366
  - Sentence Transformers: 4.1.0
367
+ - Transformers: 4.52.4
368
+ - PyTorch: 2.6.0+cu124
369
+ - Accelerate: 1.8.1
370
  - Datasets: 3.6.0
371
  - Tokenizers: 0.21.1
372
 
config.json CHANGED
@@ -18,6 +18,6 @@
18
  "pad_token_id": 1,
19
  "relative_attention_num_buckets": 32,
20
  "torch_dtype": "float32",
21
- "transformers_version": "4.53.0.dev0",
22
  "vocab_size": 30527
23
  }
 
18
  "pad_token_id": 1,
19
  "relative_attention_num_buckets": 32,
20
  "torch_dtype": "float32",
21
+ "transformers_version": "4.52.4",
22
  "vocab_size": 30527
23
  }
config_sentence_transformers.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "__version__": {
3
  "sentence_transformers": "4.1.0",
4
- "transformers": "4.53.0.dev0",
5
- "pytorch": "2.7.1+cu126"
6
  },
7
  "prompts": {},
8
  "default_prompt_name": null,
 
1
  {
2
  "__version__": {
3
  "sentence_transformers": "4.1.0",
4
+ "transformers": "4.52.4",
5
+ "pytorch": "2.6.0+cu124"
6
  },
7
  "prompts": {},
8
  "default_prompt_name": null,
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:78b275fa71a5560f7f63d8c94611c8f5c946a149acf2dd3da3ec538d475eeb55
3
  size 437967672
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:070b8e311a59229e3d1911753c8912809c4b6c99f9cf43c46f1c8ac5dfe915e0
3
  size 437967672
optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1a18eff5b664d8f157a3ad9b8a30005d08cf9c585c1ae1891947228afaa84418
3
- size 871332235
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:636d83f75ef3b6379d2de1380140e9e6db0862aa30c2f61df168fa48a5f11f94
3
+ size 871331770
rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7c48dde672ed03e77e045d0c82330c6b3c2192c6cb466bff8ae450344c711c8a
3
- size 14645
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:784b875c2b86372c41eaa4d7d8efaa50c3c0a99edec1ace8f8b943345f97b54f
3
+ size 14244
scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ab0d223a393f22ac34c1911f5e5be757f106a1d2c6c8a7bef5fb3ffd7decea9c
3
- size 1465
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45b9b94e19b7c7a2fcd96ee21ab65fa9d6c05333276c875f55a40f3bff2d6f6f
3
+ size 1064
special_tokens_map.json CHANGED
@@ -9,7 +9,7 @@
9
  "cls_token": {
10
  "content": "<s>",
11
  "lstrip": false,
12
- "normalized": false,
13
  "rstrip": false,
14
  "single_word": false
15
  },
@@ -37,7 +37,7 @@
37
  "sep_token": {
38
  "content": "</s>",
39
  "lstrip": false,
40
- "normalized": false,
41
  "rstrip": false,
42
  "single_word": false
43
  },
 
9
  "cls_token": {
10
  "content": "<s>",
11
  "lstrip": false,
12
+ "normalized": true,
13
  "rstrip": false,
14
  "single_word": false
15
  },
 
37
  "sep_token": {
38
  "content": "</s>",
39
  "lstrip": false,
40
+ "normalized": true,
41
  "rstrip": false,
42
  "single_word": false
43
  },
tokenizer_config.json CHANGED
@@ -56,18 +56,11 @@
56
  "eos_token": "</s>",
57
  "extra_special_tokens": {},
58
  "mask_token": "<mask>",
59
- "max_length": 512,
60
  "model_max_length": 512,
61
- "pad_to_multiple_of": null,
62
  "pad_token": "<pad>",
63
- "pad_token_type_id": 0,
64
- "padding_side": "right",
65
  "sep_token": "</s>",
66
- "stride": 0,
67
  "strip_accents": null,
68
  "tokenize_chinese_chars": true,
69
  "tokenizer_class": "MPNetTokenizer",
70
- "truncation_side": "right",
71
- "truncation_strategy": "longest_first",
72
  "unk_token": "[UNK]"
73
  }
 
56
  "eos_token": "</s>",
57
  "extra_special_tokens": {},
58
  "mask_token": "<mask>",
 
59
  "model_max_length": 512,
 
60
  "pad_token": "<pad>",
 
 
61
  "sep_token": "</s>",
 
62
  "strip_accents": null,
63
  "tokenize_chinese_chars": true,
64
  "tokenizer_class": "MPNetTokenizer",
 
 
65
  "unk_token": "[UNK]"
66
  }
trainer_state.json CHANGED
@@ -2,369 +2,514 @@
2
  "best_global_step": null,
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
- "epoch": 5.0,
6
- "eval_steps": 500,
7
- "global_step": 5005,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
11
  "log_history": [
12
  {
13
- "epoch": 0.0999000999000999,
14
- "grad_norm": 8.842884063720703,
15
- "learning_rate": 3.952095808383234e-06,
16
- "loss": 3.0908,
17
- "step": 100
18
  },
19
  {
20
- "epoch": 0.1998001998001998,
21
- "grad_norm": 19.580947875976562,
22
- "learning_rate": 7.944111776447106e-06,
23
- "loss": 2.3816,
 
24
  "step": 200
25
  },
26
  {
27
- "epoch": 0.2997002997002997,
28
- "grad_norm": 17.492271423339844,
29
- "learning_rate": 1.193612774451098e-05,
30
- "loss": 2.1439,
31
- "step": 300
32
  },
33
  {
34
- "epoch": 0.3996003996003996,
35
- "grad_norm": 17.586090087890625,
36
- "learning_rate": 1.592814371257485e-05,
37
- "loss": 1.9773,
 
38
  "step": 400
39
  },
40
  {
41
- "epoch": 0.4995004995004995,
42
- "grad_norm": 15.372795104980469,
43
- "learning_rate": 1.9920159680638723e-05,
44
- "loss": 1.9802,
45
- "step": 500
46
  },
47
  {
48
- "epoch": 0.5994005994005994,
49
- "grad_norm": 13.545052528381348,
50
- "learning_rate": 1.9564831261101244e-05,
51
- "loss": 1.9266,
 
52
  "step": 600
53
  },
54
  {
55
- "epoch": 0.6993006993006993,
56
- "grad_norm": 14.97186279296875,
57
- "learning_rate": 1.9120781527531086e-05,
58
- "loss": 1.8154,
59
- "step": 700
60
  },
61
  {
62
- "epoch": 0.7992007992007992,
63
- "grad_norm": 14.916701316833496,
64
- "learning_rate": 1.8676731793960924e-05,
65
- "loss": 1.792,
 
66
  "step": 800
67
  },
68
  {
69
- "epoch": 0.8991008991008991,
70
- "grad_norm": 12.8872709274292,
71
- "learning_rate": 1.8232682060390766e-05,
72
- "loss": 1.7716,
73
- "step": 900
74
  },
75
  {
76
- "epoch": 0.999000999000999,
77
- "grad_norm": 10.461772918701172,
78
- "learning_rate": 1.7788632326820604e-05,
79
- "loss": 1.7447,
 
80
  "step": 1000
81
  },
82
  {
83
- "epoch": 1.098901098901099,
84
- "grad_norm": 11.06071662902832,
85
- "learning_rate": 1.7344582593250445e-05,
86
- "loss": 1.6836,
87
- "step": 1100
88
  },
89
  {
90
- "epoch": 1.1988011988011988,
91
- "grad_norm": 13.601765632629395,
92
- "learning_rate": 1.6900532859680287e-05,
93
- "loss": 1.635,
 
94
  "step": 1200
95
  },
96
  {
97
- "epoch": 1.2987012987012987,
98
- "grad_norm": 13.174976348876953,
99
- "learning_rate": 1.6456483126110125e-05,
100
- "loss": 1.555,
101
- "step": 1300
102
  },
103
  {
104
- "epoch": 1.3986013986013985,
105
- "grad_norm": 16.08052635192871,
106
- "learning_rate": 1.6012433392539967e-05,
107
- "loss": 1.6318,
 
108
  "step": 1400
109
  },
110
  {
111
- "epoch": 1.4985014985014984,
112
- "grad_norm": 15.495978355407715,
113
- "learning_rate": 1.5568383658969805e-05,
114
- "loss": 1.6539,
115
- "step": 1500
116
  },
117
  {
118
- "epoch": 1.5984015984015985,
119
- "grad_norm": 11.14354133605957,
120
- "learning_rate": 1.5124333925399647e-05,
121
- "loss": 1.5797,
 
122
  "step": 1600
123
  },
124
  {
125
- "epoch": 1.6983016983016983,
126
- "grad_norm": 10.451489448547363,
127
- "learning_rate": 1.4680284191829486e-05,
128
- "loss": 1.5831,
129
- "step": 1700
130
  },
131
  {
132
- "epoch": 1.7982017982017982,
133
- "grad_norm": 12.40042781829834,
134
- "learning_rate": 1.4236234458259326e-05,
135
- "loss": 1.5727,
 
136
  "step": 1800
137
  },
138
  {
139
- "epoch": 1.8981018981018982,
140
- "grad_norm": 13.592798233032227,
141
- "learning_rate": 1.3792184724689166e-05,
142
- "loss": 1.5969,
143
- "step": 1900
144
  },
145
  {
146
- "epoch": 1.9980019980019978,
147
- "grad_norm": 12.129600524902344,
148
- "learning_rate": 1.3348134991119006e-05,
149
- "loss": 1.5202,
 
150
  "step": 2000
151
  },
152
  {
153
- "epoch": 2.097902097902098,
154
- "grad_norm": 13.278411865234375,
155
- "learning_rate": 1.2904085257548846e-05,
156
- "loss": 1.4787,
157
- "step": 2100
158
  },
159
  {
160
- "epoch": 2.197802197802198,
161
- "grad_norm": 11.896201133728027,
162
- "learning_rate": 1.2460035523978686e-05,
163
- "loss": 1.4702,
 
164
  "step": 2200
165
  },
166
  {
167
- "epoch": 2.2977022977022976,
168
- "grad_norm": 12.858651161193848,
169
- "learning_rate": 1.2015985790408526e-05,
170
- "loss": 1.4389,
171
- "step": 2300
172
  },
173
  {
174
- "epoch": 2.3976023976023977,
175
- "grad_norm": 11.218062400817871,
176
- "learning_rate": 1.1571936056838366e-05,
177
- "loss": 1.4196,
 
178
  "step": 2400
179
  },
180
  {
181
- "epoch": 2.4975024975024973,
182
- "grad_norm": 12.738713264465332,
183
- "learning_rate": 1.1127886323268207e-05,
184
- "loss": 1.4352,
185
- "step": 2500
186
  },
187
  {
188
- "epoch": 2.5974025974025974,
189
- "grad_norm": 11.351848602294922,
190
- "learning_rate": 1.0683836589698047e-05,
191
- "loss": 1.395,
 
192
  "step": 2600
193
  },
194
  {
195
- "epoch": 2.6973026973026974,
196
- "grad_norm": 15.531521797180176,
197
- "learning_rate": 1.0239786856127887e-05,
198
- "loss": 1.462,
199
- "step": 2700
200
  },
201
  {
202
- "epoch": 2.797202797202797,
203
- "grad_norm": 13.97511100769043,
204
- "learning_rate": 9.795737122557727e-06,
205
- "loss": 1.4439,
 
206
  "step": 2800
207
  },
208
  {
209
- "epoch": 2.897102897102897,
210
- "grad_norm": 11.752087593078613,
211
- "learning_rate": 9.351687388987567e-06,
212
- "loss": 1.4478,
213
- "step": 2900
214
  },
215
  {
216
- "epoch": 2.9970029970029968,
217
- "grad_norm": 10.799909591674805,
218
- "learning_rate": 8.907637655417407e-06,
219
- "loss": 1.4617,
 
220
  "step": 3000
221
  },
222
  {
223
- "epoch": 3.096903096903097,
224
- "grad_norm": 13.04091739654541,
225
- "learning_rate": 8.463587921847247e-06,
226
- "loss": 1.3563,
227
- "step": 3100
228
  },
229
  {
230
- "epoch": 3.196803196803197,
231
- "grad_norm": 8.328659057617188,
232
- "learning_rate": 8.019538188277087e-06,
233
- "loss": 1.4004,
 
234
  "step": 3200
235
  },
236
  {
237
- "epoch": 3.2967032967032965,
238
- "grad_norm": 14.43315601348877,
239
- "learning_rate": 7.575488454706927e-06,
240
- "loss": 1.3557,
241
- "step": 3300
242
  },
243
  {
244
- "epoch": 3.3966033966033966,
245
- "grad_norm": 11.241643905639648,
246
- "learning_rate": 7.131438721136767e-06,
247
- "loss": 1.3226,
 
248
  "step": 3400
249
  },
250
  {
251
- "epoch": 3.4965034965034967,
252
- "grad_norm": 13.721736907958984,
253
- "learning_rate": 6.687388987566608e-06,
254
- "loss": 1.3516,
255
- "step": 3500
256
  },
257
  {
258
- "epoch": 3.5964035964035963,
259
- "grad_norm": 9.1060791015625,
260
- "learning_rate": 6.243339253996448e-06,
261
- "loss": 1.3219,
 
262
  "step": 3600
263
  },
264
  {
265
- "epoch": 3.6963036963036964,
266
- "grad_norm": 15.87879753112793,
267
- "learning_rate": 5.799289520426288e-06,
268
- "loss": 1.4065,
269
- "step": 3700
270
  },
271
  {
272
- "epoch": 3.7962037962037964,
273
- "grad_norm": 15.852932929992676,
274
- "learning_rate": 5.355239786856128e-06,
275
- "loss": 1.3987,
 
276
  "step": 3800
277
  },
278
  {
279
- "epoch": 3.896103896103896,
280
- "grad_norm": 14.717906951904297,
281
- "learning_rate": 4.911190053285968e-06,
282
- "loss": 1.3364,
283
- "step": 3900
284
  },
285
  {
286
- "epoch": 3.996003996003996,
287
- "grad_norm": 12.673909187316895,
288
- "learning_rate": 4.467140319715808e-06,
289
- "loss": 1.3198,
 
290
  "step": 4000
291
  },
292
  {
293
- "epoch": 4.095904095904096,
294
- "grad_norm": 14.972280502319336,
295
- "learning_rate": 4.023090586145649e-06,
296
- "loss": 1.2749,
297
- "step": 4100
298
  },
299
  {
300
- "epoch": 4.195804195804196,
301
- "grad_norm": 14.906390190124512,
302
- "learning_rate": 3.579040852575489e-06,
303
- "loss": 1.2973,
 
304
  "step": 4200
305
  },
306
  {
307
- "epoch": 4.2957042957042955,
308
- "grad_norm": 13.691140174865723,
309
- "learning_rate": 3.134991119005329e-06,
310
- "loss": 1.2335,
311
- "step": 4300
312
  },
313
  {
314
- "epoch": 4.395604395604396,
315
- "grad_norm": 10.398210525512695,
316
- "learning_rate": 2.690941385435169e-06,
317
- "loss": 1.2816,
 
318
  "step": 4400
319
  },
320
  {
321
- "epoch": 4.495504495504496,
322
- "grad_norm": 10.417113304138184,
323
- "learning_rate": 2.246891651865009e-06,
324
- "loss": 1.2963,
325
- "step": 4500
326
  },
327
  {
328
- "epoch": 4.595404595404595,
329
- "grad_norm": 9.694279670715332,
330
- "learning_rate": 1.8028419182948491e-06,
331
- "loss": 1.2932,
 
332
  "step": 4600
333
  },
334
  {
335
- "epoch": 4.695304695304696,
336
- "grad_norm": 9.184102058410645,
337
- "learning_rate": 1.3587921847246892e-06,
338
- "loss": 1.2579,
339
- "step": 4700
340
  },
341
  {
342
- "epoch": 4.795204795204795,
343
- "grad_norm": 14.067358016967773,
344
- "learning_rate": 9.147424511545295e-07,
345
- "loss": 1.3312,
 
346
  "step": 4800
347
  },
348
  {
349
- "epoch": 4.895104895104895,
350
- "grad_norm": 12.279874801635742,
351
- "learning_rate": 4.706927175843695e-07,
352
- "loss": 1.262,
353
- "step": 4900
354
  },
355
  {
356
- "epoch": 4.995004995004995,
357
- "grad_norm": 16.635345458984375,
358
- "learning_rate": 2.6642984014209594e-08,
359
- "loss": 1.3559,
 
360
  "step": 5000
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
361
  }
362
  ],
363
- "logging_steps": 100,
364
- "max_steps": 5005,
365
  "num_input_tokens_seen": 0,
366
- "num_train_epochs": 5,
367
- "save_steps": 600,
368
  "stateful_callbacks": {
369
  "TrainerControl": {
370
  "args": {
 
2
  "best_global_step": null,
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
+ "epoch": 10.0,
6
+ "eval_steps": 200,
7
+ "global_step": 6650,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
11
  "log_history": [
12
  {
13
+ "epoch": 0.3007518796992481,
14
+ "grad_norm": 23.491586685180664,
15
+ "learning_rate": 5.984962406015038e-06,
16
+ "loss": 2.8113,
17
+ "step": 200
18
  },
19
  {
20
+ "epoch": 0.3007518796992481,
21
+ "eval_train_loss": 2.0799365043640137,
22
+ "eval_train_runtime": 5.0075,
23
+ "eval_train_samples_per_second": 1070.204,
24
+ "eval_train_steps_per_second": 66.9,
25
  "step": 200
26
  },
27
  {
28
+ "epoch": 0.6015037593984962,
29
+ "grad_norm": 34.29732131958008,
30
+ "learning_rate": 1.2e-05,
31
+ "loss": 2.0877,
32
+ "step": 400
33
  },
34
  {
35
+ "epoch": 0.6015037593984962,
36
+ "eval_train_loss": 1.923947811126709,
37
+ "eval_train_runtime": 5.0362,
38
+ "eval_train_samples_per_second": 1064.098,
39
+ "eval_train_steps_per_second": 66.519,
40
  "step": 400
41
  },
42
  {
43
+ "epoch": 0.9022556390977443,
44
+ "grad_norm": 38.32964324951172,
45
+ "learning_rate": 1.8015037593984962e-05,
46
+ "loss": 1.9258,
47
+ "step": 600
48
  },
49
  {
50
+ "epoch": 0.9022556390977443,
51
+ "eval_train_loss": 1.888200283050537,
52
+ "eval_train_runtime": 4.9999,
53
+ "eval_train_samples_per_second": 1071.817,
54
+ "eval_train_steps_per_second": 67.001,
55
  "step": 600
56
  },
57
  {
58
+ "epoch": 1.2030075187969924,
59
+ "grad_norm": 15.528878211975098,
60
+ "learning_rate": 1.9552213868003343e-05,
61
+ "loss": 1.7382,
62
+ "step": 800
63
  },
64
  {
65
+ "epoch": 1.2030075187969924,
66
+ "eval_train_loss": 1.8683608770370483,
67
+ "eval_train_runtime": 5.0138,
68
+ "eval_train_samples_per_second": 1068.861,
69
+ "eval_train_steps_per_second": 66.816,
70
  "step": 800
71
  },
72
  {
73
+ "epoch": 1.5037593984962405,
74
+ "grad_norm": 13.27901554107666,
75
+ "learning_rate": 1.8883876357560568e-05,
76
+ "loss": 1.7232,
77
+ "step": 1000
78
  },
79
  {
80
+ "epoch": 1.5037593984962405,
81
+ "eval_train_loss": 1.8225561380386353,
82
+ "eval_train_runtime": 5.0119,
83
+ "eval_train_samples_per_second": 1069.261,
84
+ "eval_train_steps_per_second": 66.841,
85
  "step": 1000
86
  },
87
  {
88
+ "epoch": 1.8045112781954886,
89
+ "grad_norm": 11.53130054473877,
90
+ "learning_rate": 1.8215538847117796e-05,
91
+ "loss": 1.6814,
92
+ "step": 1200
93
  },
94
  {
95
+ "epoch": 1.8045112781954886,
96
+ "eval_train_loss": 1.8166730403900146,
97
+ "eval_train_runtime": 5.0134,
98
+ "eval_train_samples_per_second": 1068.945,
99
+ "eval_train_steps_per_second": 66.822,
100
  "step": 1200
101
  },
102
  {
103
+ "epoch": 2.1052631578947367,
104
+ "grad_norm": 14.417011260986328,
105
+ "learning_rate": 1.754720133667502e-05,
106
+ "loss": 1.5764,
107
+ "step": 1400
108
  },
109
  {
110
+ "epoch": 2.1052631578947367,
111
+ "eval_train_loss": 1.8132838010787964,
112
+ "eval_train_runtime": 5.0144,
113
+ "eval_train_samples_per_second": 1068.73,
114
+ "eval_train_steps_per_second": 66.808,
115
  "step": 1400
116
  },
117
  {
118
+ "epoch": 2.406015037593985,
119
+ "grad_norm": 11.700883865356445,
120
+ "learning_rate": 1.6878863826232248e-05,
121
+ "loss": 1.5333,
122
+ "step": 1600
123
  },
124
  {
125
+ "epoch": 2.406015037593985,
126
+ "eval_train_loss": 1.7898207902908325,
127
+ "eval_train_runtime": 5.0228,
128
+ "eval_train_samples_per_second": 1066.927,
129
+ "eval_train_steps_per_second": 66.695,
130
  "step": 1600
131
  },
132
  {
133
+ "epoch": 2.706766917293233,
134
+ "grad_norm": 13.112250328063965,
135
+ "learning_rate": 1.6210526315789473e-05,
136
+ "loss": 1.5216,
137
+ "step": 1800
138
  },
139
  {
140
+ "epoch": 2.706766917293233,
141
+ "eval_train_loss": 1.7781648635864258,
142
+ "eval_train_runtime": 5.0052,
143
+ "eval_train_samples_per_second": 1070.687,
144
+ "eval_train_steps_per_second": 66.93,
145
  "step": 1800
146
  },
147
  {
148
+ "epoch": 3.007518796992481,
149
+ "grad_norm": 11.557127952575684,
150
+ "learning_rate": 1.55421888053467e-05,
151
+ "loss": 1.4966,
152
+ "step": 2000
153
  },
154
  {
155
+ "epoch": 3.007518796992481,
156
+ "eval_train_loss": 1.7662715911865234,
157
+ "eval_train_runtime": 5.0354,
158
+ "eval_train_samples_per_second": 1064.268,
159
+ "eval_train_steps_per_second": 66.529,
160
  "step": 2000
161
  },
162
  {
163
+ "epoch": 3.308270676691729,
164
+ "grad_norm": 10.65110969543457,
165
+ "learning_rate": 1.4873851294903927e-05,
166
+ "loss": 1.4325,
167
+ "step": 2200
168
  },
169
  {
170
+ "epoch": 3.308270676691729,
171
+ "eval_train_loss": 1.764186143875122,
172
+ "eval_train_runtime": 5.0269,
173
+ "eval_train_samples_per_second": 1066.066,
174
+ "eval_train_steps_per_second": 66.642,
175
  "step": 2200
176
  },
177
  {
178
+ "epoch": 3.6090225563909772,
179
+ "grad_norm": 11.466296195983887,
180
+ "learning_rate": 1.4205513784461153e-05,
181
+ "loss": 1.4043,
182
+ "step": 2400
183
  },
184
  {
185
+ "epoch": 3.6090225563909772,
186
+ "eval_train_loss": 1.7955785989761353,
187
+ "eval_train_runtime": 5.06,
188
+ "eval_train_samples_per_second": 1059.097,
189
+ "eval_train_steps_per_second": 66.206,
190
  "step": 2400
191
  },
192
  {
193
+ "epoch": 3.909774436090226,
194
+ "grad_norm": 9.564383506774902,
195
+ "learning_rate": 1.353717627401838e-05,
196
+ "loss": 1.4212,
197
+ "step": 2600
198
  },
199
  {
200
+ "epoch": 3.909774436090226,
201
+ "eval_train_loss": 1.7609018087387085,
202
+ "eval_train_runtime": 5.0402,
203
+ "eval_train_samples_per_second": 1063.247,
204
+ "eval_train_steps_per_second": 66.465,
205
  "step": 2600
206
  },
207
  {
208
+ "epoch": 4.2105263157894735,
209
+ "grad_norm": 12.078660011291504,
210
+ "learning_rate": 1.2868838763575606e-05,
211
+ "loss": 1.3808,
212
+ "step": 2800
213
  },
214
  {
215
+ "epoch": 4.2105263157894735,
216
+ "eval_train_loss": 1.7610782384872437,
217
+ "eval_train_runtime": 5.0859,
218
+ "eval_train_samples_per_second": 1053.692,
219
+ "eval_train_steps_per_second": 65.868,
220
  "step": 2800
221
  },
222
  {
223
+ "epoch": 4.511278195488722,
224
+ "grad_norm": 10.561222076416016,
225
+ "learning_rate": 1.2200501253132832e-05,
226
+ "loss": 1.35,
227
+ "step": 3000
228
  },
229
  {
230
+ "epoch": 4.511278195488722,
231
+ "eval_train_loss": 1.7670680284500122,
232
+ "eval_train_runtime": 5.0558,
233
+ "eval_train_samples_per_second": 1059.976,
234
+ "eval_train_steps_per_second": 66.261,
235
  "step": 3000
236
  },
237
  {
238
+ "epoch": 4.81203007518797,
239
+ "grad_norm": 14.785975456237793,
240
+ "learning_rate": 1.1532163742690059e-05,
241
+ "loss": 1.3644,
242
+ "step": 3200
243
  },
244
  {
245
+ "epoch": 4.81203007518797,
246
+ "eval_train_loss": 1.751652479171753,
247
+ "eval_train_runtime": 5.0835,
248
+ "eval_train_samples_per_second": 1054.196,
249
+ "eval_train_steps_per_second": 65.9,
250
  "step": 3200
251
  },
252
  {
253
+ "epoch": 5.112781954887218,
254
+ "grad_norm": 10.927189826965332,
255
+ "learning_rate": 1.0863826232247285e-05,
256
+ "loss": 1.304,
257
+ "step": 3400
258
  },
259
  {
260
+ "epoch": 5.112781954887218,
261
+ "eval_train_loss": 1.7712498903274536,
262
+ "eval_train_runtime": 5.0673,
263
+ "eval_train_samples_per_second": 1057.559,
264
+ "eval_train_steps_per_second": 66.11,
265
  "step": 3400
266
  },
267
  {
268
+ "epoch": 5.413533834586466,
269
+ "grad_norm": 14.33267879486084,
270
+ "learning_rate": 1.0195488721804511e-05,
271
+ "loss": 1.288,
272
+ "step": 3600
273
  },
274
  {
275
+ "epoch": 5.413533834586466,
276
+ "eval_train_loss": 1.7820113897323608,
277
+ "eval_train_runtime": 5.086,
278
+ "eval_train_samples_per_second": 1053.672,
279
+ "eval_train_steps_per_second": 65.867,
280
  "step": 3600
281
  },
282
  {
283
+ "epoch": 5.714285714285714,
284
+ "grad_norm": 11.89034366607666,
285
+ "learning_rate": 9.527151211361737e-06,
286
+ "loss": 1.3051,
287
+ "step": 3800
288
  },
289
  {
290
+ "epoch": 5.714285714285714,
291
+ "eval_train_loss": 1.7699248790740967,
292
+ "eval_train_runtime": 5.1253,
293
+ "eval_train_samples_per_second": 1045.605,
294
+ "eval_train_steps_per_second": 65.363,
295
  "step": 3800
296
  },
297
  {
298
+ "epoch": 6.015037593984962,
299
+ "grad_norm": 10.595609664916992,
300
+ "learning_rate": 8.858813700918964e-06,
301
+ "loss": 1.2803,
302
+ "step": 4000
303
  },
304
  {
305
+ "epoch": 6.015037593984962,
306
+ "eval_train_loss": 1.7678076028823853,
307
+ "eval_train_runtime": 5.1035,
308
+ "eval_train_samples_per_second": 1050.07,
309
+ "eval_train_steps_per_second": 65.642,
310
  "step": 4000
311
  },
312
  {
313
+ "epoch": 6.315789473684211,
314
+ "grad_norm": 14.781892776489258,
315
+ "learning_rate": 8.190476190476192e-06,
316
+ "loss": 1.2026,
317
+ "step": 4200
318
  },
319
  {
320
+ "epoch": 6.315789473684211,
321
+ "eval_train_loss": 1.7812011241912842,
322
+ "eval_train_runtime": 5.1217,
323
+ "eval_train_samples_per_second": 1046.331,
324
+ "eval_train_steps_per_second": 65.408,
325
  "step": 4200
326
  },
327
  {
328
+ "epoch": 6.616541353383458,
329
+ "grad_norm": 11.254812240600586,
330
+ "learning_rate": 7.522138680033417e-06,
331
+ "loss": 1.2602,
332
+ "step": 4400
333
  },
334
  {
335
+ "epoch": 6.616541353383458,
336
+ "eval_train_loss": 1.7846208810806274,
337
+ "eval_train_runtime": 5.1259,
338
+ "eval_train_samples_per_second": 1045.481,
339
+ "eval_train_steps_per_second": 65.355,
340
  "step": 4400
341
  },
342
  {
343
+ "epoch": 6.917293233082707,
344
+ "grad_norm": 9.643959999084473,
345
+ "learning_rate": 6.8538011695906435e-06,
346
+ "loss": 1.2392,
347
+ "step": 4600
348
  },
349
  {
350
+ "epoch": 6.917293233082707,
351
+ "eval_train_loss": 1.7733409404754639,
352
+ "eval_train_runtime": 5.1326,
353
+ "eval_train_samples_per_second": 1044.114,
354
+ "eval_train_steps_per_second": 65.269,
355
  "step": 4600
356
  },
357
  {
358
+ "epoch": 7.2180451127819545,
359
+ "grad_norm": 12.258922576904297,
360
+ "learning_rate": 6.18546365914787e-06,
361
+ "loss": 1.2088,
362
+ "step": 4800
363
  },
364
  {
365
+ "epoch": 7.2180451127819545,
366
+ "eval_train_loss": 1.7745392322540283,
367
+ "eval_train_runtime": 5.1493,
368
+ "eval_train_samples_per_second": 1040.714,
369
+ "eval_train_steps_per_second": 65.057,
370
  "step": 4800
371
  },
372
  {
373
+ "epoch": 7.518796992481203,
374
+ "grad_norm": 12.351716041564941,
375
+ "learning_rate": 5.517126148705096e-06,
376
+ "loss": 1.1791,
377
+ "step": 5000
378
  },
379
  {
380
+ "epoch": 7.518796992481203,
381
+ "eval_train_loss": 1.7866636514663696,
382
+ "eval_train_runtime": 5.144,
383
+ "eval_train_samples_per_second": 1041.787,
384
+ "eval_train_steps_per_second": 65.124,
385
  "step": 5000
386
+ },
387
+ {
388
+ "epoch": 7.819548872180452,
389
+ "grad_norm": 15.052789688110352,
390
+ "learning_rate": 4.8487886382623224e-06,
391
+ "loss": 1.1946,
392
+ "step": 5200
393
+ },
394
+ {
395
+ "epoch": 7.819548872180452,
396
+ "eval_train_loss": 1.7778518199920654,
397
+ "eval_train_runtime": 5.1357,
398
+ "eval_train_samples_per_second": 1043.481,
399
+ "eval_train_steps_per_second": 65.23,
400
+ "step": 5200
401
+ },
402
+ {
403
+ "epoch": 8.1203007518797,
404
+ "grad_norm": 8.957300186157227,
405
+ "learning_rate": 4.18045112781955e-06,
406
+ "loss": 1.1617,
407
+ "step": 5400
408
+ },
409
+ {
410
+ "epoch": 8.1203007518797,
411
+ "eval_train_loss": 1.7931042909622192,
412
+ "eval_train_runtime": 5.1877,
413
+ "eval_train_samples_per_second": 1033.016,
414
+ "eval_train_steps_per_second": 64.576,
415
+ "step": 5400
416
+ },
417
+ {
418
+ "epoch": 8.421052631578947,
419
+ "grad_norm": 13.89137077331543,
420
+ "learning_rate": 3.5121136173767755e-06,
421
+ "loss": 1.1495,
422
+ "step": 5600
423
+ },
424
+ {
425
+ "epoch": 8.421052631578947,
426
+ "eval_train_loss": 1.791070818901062,
427
+ "eval_train_runtime": 5.1363,
428
+ "eval_train_samples_per_second": 1043.352,
429
+ "eval_train_steps_per_second": 65.222,
430
+ "step": 5600
431
+ },
432
+ {
433
+ "epoch": 8.721804511278195,
434
+ "grad_norm": 11.32971477508545,
435
+ "learning_rate": 2.8437761069340018e-06,
436
+ "loss": 1.1635,
437
+ "step": 5800
438
+ },
439
+ {
440
+ "epoch": 8.721804511278195,
441
+ "eval_train_loss": 1.794918417930603,
442
+ "eval_train_runtime": 5.1728,
443
+ "eval_train_samples_per_second": 1035.991,
444
+ "eval_train_steps_per_second": 64.762,
445
+ "step": 5800
446
+ },
447
+ {
448
+ "epoch": 9.022556390977444,
449
+ "grad_norm": 13.075417518615723,
450
+ "learning_rate": 2.1754385964912285e-06,
451
+ "loss": 1.1324,
452
+ "step": 6000
453
+ },
454
+ {
455
+ "epoch": 9.022556390977444,
456
+ "eval_train_loss": 1.7962439060211182,
457
+ "eval_train_runtime": 5.1942,
458
+ "eval_train_samples_per_second": 1031.737,
459
+ "eval_train_steps_per_second": 64.496,
460
+ "step": 6000
461
+ },
462
+ {
463
+ "epoch": 9.323308270676693,
464
+ "grad_norm": 12.90481948852539,
465
+ "learning_rate": 1.5071010860484548e-06,
466
+ "loss": 1.1304,
467
+ "step": 6200
468
+ },
469
+ {
470
+ "epoch": 9.323308270676693,
471
+ "eval_train_loss": 1.8035305738449097,
472
+ "eval_train_runtime": 5.1397,
473
+ "eval_train_samples_per_second": 1042.671,
474
+ "eval_train_steps_per_second": 65.179,
475
+ "step": 6200
476
+ },
477
+ {
478
+ "epoch": 9.62406015037594,
479
+ "grad_norm": 12.501527786254883,
480
+ "learning_rate": 8.38763575605681e-07,
481
+ "loss": 1.1126,
482
+ "step": 6400
483
+ },
484
+ {
485
+ "epoch": 9.62406015037594,
486
+ "eval_train_loss": 1.8056447505950928,
487
+ "eval_train_runtime": 5.1771,
488
+ "eval_train_samples_per_second": 1035.144,
489
+ "eval_train_steps_per_second": 64.709,
490
+ "step": 6400
491
+ },
492
+ {
493
+ "epoch": 9.924812030075188,
494
+ "grad_norm": 10.318084716796875,
495
+ "learning_rate": 1.704260651629073e-07,
496
+ "loss": 1.0986,
497
+ "step": 6600
498
+ },
499
+ {
500
+ "epoch": 9.924812030075188,
501
+ "eval_train_loss": 1.806175947189331,
502
+ "eval_train_runtime": 5.1696,
503
+ "eval_train_samples_per_second": 1036.634,
504
+ "eval_train_steps_per_second": 64.802,
505
+ "step": 6600
506
  }
507
  ],
508
+ "logging_steps": 200,
509
+ "max_steps": 6650,
510
  "num_input_tokens_seen": 0,
511
+ "num_train_epochs": 10,
512
+ "save_steps": 3000,
513
  "stateful_callbacks": {
514
  "TrainerControl": {
515
  "args": {
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a8c6dda4ac3c61c8dee69d41b2109cadfdc72b31128ff1e478c7ceb7f8f8c760
3
- size 5969
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5426350813d7892767af2be085b90ee8f4228e448896c2b7304612735ddb7b6
3
+ size 5496