Upload folder using huggingface_hub

Browse files

Files changed (16) hide show

checkpoint-14500/1_Pooling/config.json +10 -0
checkpoint-14500/README.md +480 -0
checkpoint-14500/config.json +26 -0
checkpoint-14500/config_sentence_transformers.json +10 -0
checkpoint-14500/model.safetensors +3 -0
checkpoint-14500/modules.json +20 -0
checkpoint-14500/optimizer.pt +3 -0
checkpoint-14500/rng_state.pth +3 -0
checkpoint-14500/scheduler.pt +3 -0
checkpoint-14500/sentence_bert_config.json +4 -0
checkpoint-14500/special_tokens_map.json +37 -0
checkpoint-14500/tokenizer.json +0 -0
checkpoint-14500/tokenizer_config.json +64 -0
checkpoint-14500/trainer_state.json +1048 -0
checkpoint-14500/training_args.bin +3 -0
checkpoint-14500/vocab.txt +0 -0

checkpoint-14500/1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 384,
+  "pooling_mode_cls_token": false,
+  "pooling_mode_mean_tokens": true,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

checkpoint-14500/README.md ADDED Viewed

	@@ -0,0 +1,480 @@

+---
+base_model: sentence-transformers/all-MiniLM-L6-v2
+language:
+- en
+library_name: sentence-transformers
+license: apache-2.0
+pipeline_tag: sentence-similarity
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:2400000
+- loss:CoSENTLoss
+widget:
+- source_sentence: poolside pants
+  sentences:
+  - safe materials toy
+  - plated necklace
+  - washed cargo pants
+- source_sentence: breathable pants
+  sentences:
+  - extra definition mascara
+  - christmas trees hair clip
+  - milton shorts
+- source_sentence: mozzarella cheese burger
+  sentences:
+  - ankle length leggings
+  - nail polish
+  - olive shacket
+- source_sentence: cookie brownie
+  sentences:
+  - lime top
+  - mdf coffee corner stand
+  - learning flashcards
+- source_sentence: no artificial flavouring food
+  sentences:
+  - eye pencil
+  - tourmaline ceramic brush
+  - rubber dog toy
+---
+# all-MiniLM-L6-v9-pair_score
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
+- **Maximum Sequence Length:** 256 tokens
+- **Output Dimensionality:** 384 tokens
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+- **Language:** en
+- **License:** apache-2.0
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
+  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("sentence_transformers_model_id")
+# Run inference
+sentences = [
+    'no artificial flavouring food',
+    'rubber dog toy',
+    'tourmaline ceramic brush',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 384]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 128
+- `per_device_eval_batch_size`: 128
+- `learning_rate`: 2e-05
+- `num_train_epochs`: 1
+- `warmup_ratio`: 0.1
+- `fp16`: True
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 128
+- `per_device_eval_batch_size`: 128
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 2e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1.0
+- `num_train_epochs`: 1
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.1
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: False
+- `fp16`: True
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: False
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `dispatch_batches`: None
+- `split_batches`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `eval_use_gather_object`: False
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: proportional
+</details>
+### Training Logs
+<details><summary>Click to expand</summary>
+| Epoch  | Step  | Training Loss |
+|:------:|:-----:|:-------------:|
+| 0.0053 | 100   | 13.2077       |
+| 0.0107 | 200   | 12.3835       |
+| 0.016  | 300   | 10.7699       |
+| 0.0213 | 400   | 9.2679        |
+| 0.0267 | 500   | 8.2638        |
+| 0.032  | 600   | 7.69          |
+| 0.0373 | 700   | 7.2751        |
+| 0.0427 | 800   | 6.8786        |
+| 0.048  | 900   | 6.7811        |
+| 0.0533 | 1000  | 6.5834        |
+| 0.0587 | 1100  | 6.3517        |
+| 0.064  | 1200  | 6.2272        |
+| 0.0693 | 1300  | 6.1943        |
+| 0.0747 | 1400  | 6.1038        |
+| 0.08   | 1500  | 6.1216        |
+| 0.0853 | 1600  | 6.1429        |
+| 0.0907 | 1700  | 5.8876        |
+| 0.096  | 1800  | 5.8074        |
+| 0.1013 | 1900  | 5.6261        |
+| 0.1067 | 2000  | 5.838         |
+| 0.112  | 2100  | 5.7161        |
+| 0.1173 | 2200  | 5.5388        |
+| 0.1227 | 2300  | 5.5654        |
+| 0.128  | 2400  | 5.5196        |
+| 0.1333 | 2500  | 5.3665        |
+| 0.1387 | 2600  | 5.2952        |
+| 0.144  | 2700  | 5.4131        |
+| 0.1493 | 2800  | 5.2104        |
+| 0.1547 | 2900  | 5.2176        |
+| 0.16   | 3000  | 4.9406        |
+| 0.1653 | 3100  | 4.8781        |
+| 0.1707 | 3200  | 5.08          |
+| 0.176  | 3300  | 5.1495        |
+| 0.1813 | 3400  | 4.8717        |
+| 0.1867 | 3500  | 4.8196        |
+| 0.192  | 3600  | 4.8065        |
+| 0.1973 | 3700  | 4.718         |
+| 0.2027 | 3800  | 4.7111        |
+| 0.208  | 3900  | 4.6759        |
+| 0.2133 | 4000  | 4.7733        |
+| 0.2187 | 4100  | 4.7041        |
+| 0.224  | 4200  | 4.7898        |
+| 0.2293 | 4300  | 4.8974        |
+| 0.2347 | 4400  | 4.4939        |
+| 0.24   | 4500  | 4.4107        |
+| 0.2453 | 4600  | 4.4831        |
+| 0.2507 | 4700  | 4.4571        |
+| 0.256  | 4800  | 4.1461        |
+| 0.2613 | 4900  | 4.5198        |
+| 0.2667 | 5000  | 4.4998        |
+| 0.272  | 5100  | 4.2135        |
+| 0.2773 | 5200  | 4.441         |
+| 0.2827 | 5300  | 4.2669        |
+| 0.288  | 5400  | 4.0964        |
+| 0.2933 | 5500  | 4.2048        |
+| 0.2987 | 5600  | 4.2123        |
+| 0.304  | 5700  | 4.3391        |
+| 0.3093 | 5800  | 4.3366        |
+| 0.3147 | 5900  | 4.1775        |
+| 0.32   | 6000  | 3.9954        |
+| 0.3253 | 6100  | 4.141         |
+| 0.3307 | 6200  | 4.09          |
+| 0.336  | 6300  | 3.9517        |
+| 0.3413 | 6400  | 3.9844        |
+| 0.3467 | 6500  | 3.8902        |
+| 0.352  | 6600  | 3.571         |
+| 0.3573 | 6700  | 3.7686        |
+| 0.3627 | 6800  | 3.7766        |
+| 0.368  | 6900  | 4.0305        |
+| 0.3733 | 7000  | 4.2835        |
+| 0.3787 | 7100  | 3.8102        |
+| 0.384  | 7200  | 3.5178        |
+| 0.3893 | 7300  | 3.8828        |
+| 0.3947 | 7400  | 3.9125        |
+| 0.4    | 7500  | 3.8578        |
+| 0.4053 | 7600  | 3.7391        |
+| 0.4107 | 7700  | 3.7178        |
+| 0.416  | 7800  | 3.6572        |
+| 0.4213 | 7900  | 3.835         |
+| 0.4267 | 8000  | 3.4354        |
+| 0.432  | 8100  | 3.6725        |
+| 0.4373 | 8200  | 3.2932        |
+| 0.4427 | 8300  | 3.7056        |
+| 0.448  | 8400  | 3.9801        |
+| 0.4533 | 8500  | 3.7294        |
+| 0.4587 | 8600  | 3.6412        |
+| 0.464  | 8700  | 3.4301        |
+| 0.4693 | 8800  | 3.4932        |
+| 0.4747 | 8900  | 3.1855        |
+| 0.48   | 9000  | 3.4505        |
+| 0.4853 | 9100  | 3.4431        |
+| 0.4907 | 9200  | 3.0782        |
+| 0.496  | 9300  | 3.3604        |
+| 0.5013 | 9400  | 3.3833        |
+| 0.5067 | 9500  | 3.2887        |
+| 0.512  | 9600  | 3.1361        |
+| 0.5173 | 9700  | 3.7856        |
+| 0.5227 | 9800  | 3.4907        |
+| 0.528  | 9900  | 3.4553        |
+| 0.5333 | 10000 | 3.2604        |
+| 0.5387 | 10100 | 3.4325        |
+| 0.544  | 10200 | 3.319         |
+| 0.5493 | 10300 | 3.3623        |
+| 0.5547 | 10400 | 3.4278        |
+| 0.56   | 10500 | 3.0365        |
+| 0.5653 | 10600 | 3.1647        |
+| 0.5707 | 10700 | 3.387         |
+| 0.576  | 10800 | 3.0888        |
+| 0.5813 | 10900 | 3.2073        |
+| 0.5867 | 11000 | 3.0386        |
+| 0.592  | 11100 | 3.222         |
+| 0.5973 | 11200 | 3.1902        |
+| 0.6027 | 11300 | 3.2242        |
+| 0.608  | 11400 | 2.9589        |
+| 0.6133 | 11500 | 2.831         |
+| 0.6187 | 11600 | 3.0551        |
+| 0.624  | 11700 | 2.8091        |
+| 0.6293 | 11800 | 3.2146        |
+| 0.6347 | 11900 | 3.1964        |
+| 0.64   | 12000 | 2.9525        |
+| 0.6453 | 12100 | 3.2989        |
+| 0.6507 | 12200 | 2.9683        |
+| 0.656  | 12300 | 2.9026        |
+| 0.6613 | 12400 | 3.1533        |
+| 0.6667 | 12500 | 2.7657        |
+| 0.672  | 12600 | 3.09          |
+| 0.6773 | 12700 | 3.1612        |
+| 0.6827 | 12800 | 2.9614        |
+| 0.688  | 12900 | 3.0533        |
+| 0.6933 | 13000 | 2.7601        |
+| 0.6987 | 13100 | 2.9242        |
+| 0.704  | 13200 | 2.5517        |
+| 0.7093 | 13300 | 2.9859        |
+| 0.7147 | 13400 | 2.7317        |
+| 0.72   | 13500 | 2.7578        |
+| 0.7253 | 13600 | 3.1413        |
+| 0.7307 | 13700 | 3.0612        |
+| 0.736  | 13800 | 2.8295        |
+| 0.7413 | 13900 | 2.6263        |
+| 0.7467 | 14000 | 2.7181        |
+| 0.752  | 14100 | 2.8643        |
+| 0.7573 | 14200 | 2.903         |
+| 0.7627 | 14300 | 2.7787        |
+| 0.768  | 14400 | 2.991         |
+| 0.7733 | 14500 | 2.8306        |
+</details>
+### Framework Versions
+- Python: 3.8.10
+- Sentence Transformers: 3.1.1
+- Transformers: 4.45.2
+- PyTorch: 2.4.1+cu118
+- Accelerate: 1.0.1
+- Datasets: 3.0.1
+- Tokenizers: 0.20.3
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### CoSENTLoss
+```bibtex
+@online{kexuefm-8847,
+    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
+    author={Su Jianlin},
+    year={2022},
+    month={Jan},
+    url={https://kexue.fm/archives/8847},
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

checkpoint-14500/config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "_name_or_path": "sentence-transformers/all-MiniLM-L6-v2",
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 6,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.45.2",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

checkpoint-14500/config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "__version__": {
+    "sentence_transformers": "3.1.1",
+    "transformers": "4.45.2",
+    "pytorch": "2.4.1+cu118"
+  },
+  "prompts": {},
+  "default_prompt_name": null,
+  "similarity_fn_name": null
+}

checkpoint-14500/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d8872edc88bf361942476b2d209c38c7287f80f43f959a197efa9111b545d9fb
+size 90864192

checkpoint-14500/modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

checkpoint-14500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e26f0cbf806d067a289be04966fc4e461d7d19d51cf15a9bd225e4ba320b73be
+size 180607738

checkpoint-14500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ccbc7a91ec63d1ef34399d98fe71f72ed4462e0d0f0a33724deb951f31c28097
+size 14244

checkpoint-14500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9cf4f74cf2b91782bdd7e3ab73b8e49add2073a397ba80f6730bbc7cf0b69d2e
+size 1064

checkpoint-14500/sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 256,
+  "do_lower_case": false
+}

checkpoint-14500/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-14500/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-14500/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,64 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "mask_token": "[MASK]",
+  "max_length": 128,
+  "model_max_length": 256,
+  "never_split": null,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

checkpoint-14500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1048 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.7733333333333333,
+  "eval_steps": 200000,
+  "global_step": 14500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.005333333333333333,
+      "grad_norm": 87.1787109375,
+      "learning_rate": 1.0346666666666668e-06,
+      "loss": 13.2077,
+      "step": 100
+    },
+    {
+      "epoch": 0.010666666666666666,
+      "grad_norm": 93.43941497802734,
+      "learning_rate": 2.1013333333333335e-06,
+      "loss": 12.3835,
+      "step": 200
+    },
+    {
+      "epoch": 0.016,
+      "grad_norm": 48.778316497802734,
+      "learning_rate": 3.1680000000000004e-06,
+      "loss": 10.7699,
+      "step": 300
+    },
+    {
+      "epoch": 0.021333333333333333,
+      "grad_norm": 58.05220413208008,
+      "learning_rate": 4.234666666666667e-06,
+      "loss": 9.2679,
+      "step": 400
+    },
+    {
+      "epoch": 0.02666666666666667,
+      "grad_norm": 23.75503158569336,
+      "learning_rate": 5.301333333333334e-06,
+      "loss": 8.2638,
+      "step": 500
+    },
+    {
+      "epoch": 0.032,
+      "grad_norm": 15.003210067749023,
+      "learning_rate": 6.368000000000001e-06,
+      "loss": 7.69,
+      "step": 600
+    },
+    {
+      "epoch": 0.037333333333333336,
+      "grad_norm": 17.561220169067383,
+      "learning_rate": 7.434666666666668e-06,
+      "loss": 7.2751,
+      "step": 700
+    },
+    {
+      "epoch": 0.042666666666666665,
+      "grad_norm": 18.077470779418945,
+      "learning_rate": 8.501333333333334e-06,
+      "loss": 6.8786,
+      "step": 800
+    },
+    {
+      "epoch": 0.048,
+      "grad_norm": 23.71312141418457,
+      "learning_rate": 9.568e-06,
+      "loss": 6.7811,
+      "step": 900
+    },
+    {
+      "epoch": 0.05333333333333334,
+      "grad_norm": 23.78058624267578,
+      "learning_rate": 1.0634666666666667e-05,
+      "loss": 6.5834,
+      "step": 1000
+    },
+    {
+      "epoch": 0.058666666666666666,
+      "grad_norm": 21.788728713989258,
+      "learning_rate": 1.1701333333333333e-05,
+      "loss": 6.3517,
+      "step": 1100
+    },
+    {
+      "epoch": 0.064,
+      "grad_norm": 31.29090690612793,
+      "learning_rate": 1.2768e-05,
+      "loss": 6.2272,
+      "step": 1200
+    },
+    {
+      "epoch": 0.06933333333333333,
+      "grad_norm": 40.10956573486328,
+      "learning_rate": 1.3834666666666668e-05,
+      "loss": 6.1943,
+      "step": 1300
+    },
+    {
+      "epoch": 0.07466666666666667,
+      "grad_norm": 64.6901626586914,
+      "learning_rate": 1.4901333333333334e-05,
+      "loss": 6.1038,
+      "step": 1400
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 37.49641036987305,
+      "learning_rate": 1.5968e-05,
+      "loss": 6.1216,
+      "step": 1500
+    },
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 104.68683624267578,
+      "learning_rate": 1.7034666666666668e-05,
+      "loss": 6.1429,
+      "step": 1600
+    },
+    {
+      "epoch": 0.09066666666666667,
+      "grad_norm": 26.25409698486328,
+      "learning_rate": 1.8101333333333336e-05,
+      "loss": 5.8876,
+      "step": 1700
+    },
+    {
+      "epoch": 0.096,
+      "grad_norm": 46.158363342285156,
+      "learning_rate": 1.9168000000000004e-05,
+      "loss": 5.8074,
+      "step": 1800
+    },
+    {
+      "epoch": 0.10133333333333333,
+      "grad_norm": 148.9814910888672,
+      "learning_rate": 1.997511111111111e-05,
+      "loss": 5.6261,
+      "step": 1900
+    },
+    {
+      "epoch": 0.10666666666666667,
+      "grad_norm": 32.8914909362793,
+      "learning_rate": 1.9856592592592595e-05,
+      "loss": 5.838,
+      "step": 2000
+    },
+    {
+      "epoch": 0.112,
+      "grad_norm": 38.54491424560547,
+      "learning_rate": 1.9738074074074077e-05,
+      "loss": 5.7161,
+      "step": 2100
+    },
+    {
+      "epoch": 0.11733333333333333,
+      "grad_norm": 38.57450485229492,
+      "learning_rate": 1.9619555555555555e-05,
+      "loss": 5.5388,
+      "step": 2200
+    },
+    {
+      "epoch": 0.12266666666666666,
+      "grad_norm": 48.14970779418945,
+      "learning_rate": 1.9501037037037037e-05,
+      "loss": 5.5654,
+      "step": 2300
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 51.39189147949219,
+      "learning_rate": 1.9382518518518522e-05,
+      "loss": 5.5196,
+      "step": 2400
+    },
+    {
+      "epoch": 0.13333333333333333,
+      "grad_norm": 29.93403434753418,
+      "learning_rate": 1.9264e-05,
+      "loss": 5.3665,
+      "step": 2500
+    },
+    {
+      "epoch": 0.13866666666666666,
+      "grad_norm": 29.333316802978516,
+      "learning_rate": 1.9145481481481482e-05,
+      "loss": 5.2952,
+      "step": 2600
+    },
+    {
+      "epoch": 0.144,
+      "grad_norm": 119.10240936279297,
+      "learning_rate": 1.9026962962962964e-05,
+      "loss": 5.4131,
+      "step": 2700
+    },
+    {
+      "epoch": 0.14933333333333335,
+      "grad_norm": 60.21550750732422,
+      "learning_rate": 1.8908444444444446e-05,
+      "loss": 5.2104,
+      "step": 2800
+    },
+    {
+      "epoch": 0.15466666666666667,
+      "grad_norm": 49.59823989868164,
+      "learning_rate": 1.8789925925925928e-05,
+      "loss": 5.2176,
+      "step": 2900
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 79.62650299072266,
+      "learning_rate": 1.867140740740741e-05,
+      "loss": 4.9406,
+      "step": 3000
+    },
+    {
+      "epoch": 0.16533333333333333,
+      "grad_norm": 124.49871063232422,
+      "learning_rate": 1.855288888888889e-05,
+      "loss": 4.8781,
+      "step": 3100
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 52.037776947021484,
+      "learning_rate": 1.8435555555555558e-05,
+      "loss": 5.08,
+      "step": 3200
+    },
+    {
+      "epoch": 0.176,
+      "grad_norm": 69.88233184814453,
+      "learning_rate": 1.8317037037037036e-05,
+      "loss": 5.1495,
+      "step": 3300
+    },
+    {
+      "epoch": 0.18133333333333335,
+      "grad_norm": 71.26051330566406,
+      "learning_rate": 1.819851851851852e-05,
+      "loss": 4.8717,
+      "step": 3400
+    },
+    {
+      "epoch": 0.18666666666666668,
+      "grad_norm": 41.996055603027344,
+      "learning_rate": 1.8080000000000003e-05,
+      "loss": 4.8196,
+      "step": 3500
+    },
+    {
+      "epoch": 0.192,
+      "grad_norm": 39.33742141723633,
+      "learning_rate": 1.796148148148148e-05,
+      "loss": 4.8065,
+      "step": 3600
+    },
+    {
+      "epoch": 0.19733333333333333,
+      "grad_norm": 63.21049118041992,
+      "learning_rate": 1.7842962962962963e-05,
+      "loss": 4.718,
+      "step": 3700
+    },
+    {
+      "epoch": 0.20266666666666666,
+      "grad_norm": 54.66056823730469,
+      "learning_rate": 1.7724444444444445e-05,
+      "loss": 4.7111,
+      "step": 3800
+    },
+    {
+      "epoch": 0.208,
+      "grad_norm": 55.63251495361328,
+      "learning_rate": 1.7607111111111112e-05,
+      "loss": 4.6759,
+      "step": 3900
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 78.46200561523438,
+      "learning_rate": 1.7488592592592594e-05,
+      "loss": 4.7733,
+      "step": 4000
+    },
+    {
+      "epoch": 0.21866666666666668,
+      "grad_norm": 67.47509765625,
+      "learning_rate": 1.7370074074074075e-05,
+      "loss": 4.7041,
+      "step": 4100
+    },
+    {
+      "epoch": 0.224,
+      "grad_norm": 69.53897094726562,
+      "learning_rate": 1.7251555555555557e-05,
+      "loss": 4.7898,
+      "step": 4200
+    },
+    {
+      "epoch": 0.22933333333333333,
+      "grad_norm": 103.5851821899414,
+      "learning_rate": 1.713303703703704e-05,
+      "loss": 4.8974,
+      "step": 4300
+    },
+    {
+      "epoch": 0.23466666666666666,
+      "grad_norm": 214.16416931152344,
+      "learning_rate": 1.701451851851852e-05,
+      "loss": 4.4939,
+      "step": 4400
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 53.11494445800781,
+      "learning_rate": 1.6896000000000002e-05,
+      "loss": 4.4107,
+      "step": 4500
+    },
+    {
+      "epoch": 0.24533333333333332,
+      "grad_norm": 69.9258041381836,
+      "learning_rate": 1.6777481481481484e-05,
+      "loss": 4.4831,
+      "step": 4600
+    },
+    {
+      "epoch": 0.25066666666666665,
+      "grad_norm": 98.29943084716797,
+      "learning_rate": 1.6658962962962962e-05,
+      "loss": 4.4571,
+      "step": 4700
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 141.04574584960938,
+      "learning_rate": 1.6540444444444444e-05,
+      "loss": 4.1461,
+      "step": 4800
+    },
+    {
+      "epoch": 0.2613333333333333,
+      "grad_norm": 81.44438171386719,
+      "learning_rate": 1.642192592592593e-05,
+      "loss": 4.5198,
+      "step": 4900
+    },
+    {
+      "epoch": 0.26666666666666666,
+      "grad_norm": 64.71192169189453,
+      "learning_rate": 1.6303407407407408e-05,
+      "loss": 4.4998,
+      "step": 5000
+    },
+    {
+      "epoch": 0.272,
+      "grad_norm": 56.899715423583984,
+      "learning_rate": 1.618488888888889e-05,
+      "loss": 4.2135,
+      "step": 5100
+    },
+    {
+      "epoch": 0.2773333333333333,
+      "grad_norm": 62.43917465209961,
+      "learning_rate": 1.606637037037037e-05,
+      "loss": 4.441,
+      "step": 5200
+    },
+    {
+      "epoch": 0.2826666666666667,
+      "grad_norm": 66.38310241699219,
+      "learning_rate": 1.5947851851851853e-05,
+      "loss": 4.2669,
+      "step": 5300
+    },
+    {
+      "epoch": 0.288,
+      "grad_norm": 269.8346252441406,
+      "learning_rate": 1.5829333333333334e-05,
+      "loss": 4.0964,
+      "step": 5400
+    },
+    {
+      "epoch": 0.29333333333333333,
+      "grad_norm": 65.3775405883789,
+      "learning_rate": 1.5710814814814816e-05,
+      "loss": 4.2048,
+      "step": 5500
+    },
+    {
+      "epoch": 0.2986666666666667,
+      "grad_norm": 72.42823791503906,
+      "learning_rate": 1.5592296296296298e-05,
+      "loss": 4.2123,
+      "step": 5600
+    },
+    {
+      "epoch": 0.304,
+      "grad_norm": 110.38677215576172,
+      "learning_rate": 1.547377777777778e-05,
+      "loss": 4.3391,
+      "step": 5700
+    },
+    {
+      "epoch": 0.30933333333333335,
+      "grad_norm": 74.9642105102539,
+      "learning_rate": 1.535525925925926e-05,
+      "loss": 4.3366,
+      "step": 5800
+    },
+    {
+      "epoch": 0.31466666666666665,
+      "grad_norm": 157.83856201171875,
+      "learning_rate": 1.5236740740740743e-05,
+      "loss": 4.1775,
+      "step": 5900
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 86.05279541015625,
+      "learning_rate": 1.5118222222222223e-05,
+      "loss": 3.9954,
+      "step": 6000
+    },
+    {
+      "epoch": 0.3253333333333333,
+      "grad_norm": 191.12464904785156,
+      "learning_rate": 1.4999703703703705e-05,
+      "loss": 4.141,
+      "step": 6100
+    },
+    {
+      "epoch": 0.33066666666666666,
+      "grad_norm": 77.75936889648438,
+      "learning_rate": 1.4881185185185187e-05,
+      "loss": 4.09,
+      "step": 6200
+    },
+    {
+      "epoch": 0.336,
+      "grad_norm": 78.64347839355469,
+      "learning_rate": 1.4762666666666667e-05,
+      "loss": 3.9517,
+      "step": 6300
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 90.5103988647461,
+      "learning_rate": 1.4645333333333334e-05,
+      "loss": 3.9844,
+      "step": 6400
+    },
+    {
+      "epoch": 0.3466666666666667,
+      "grad_norm": 79.83901977539062,
+      "learning_rate": 1.4526814814814815e-05,
+      "loss": 3.8902,
+      "step": 6500
+    },
+    {
+      "epoch": 0.352,
+      "grad_norm": 87.23860931396484,
+      "learning_rate": 1.4408296296296299e-05,
+      "loss": 3.571,
+      "step": 6600
+    },
+    {
+      "epoch": 0.35733333333333334,
+      "grad_norm": 64.76517486572266,
+      "learning_rate": 1.4289777777777777e-05,
+      "loss": 3.7686,
+      "step": 6700
+    },
+    {
+      "epoch": 0.3626666666666667,
+      "grad_norm": 86.63563537597656,
+      "learning_rate": 1.417125925925926e-05,
+      "loss": 3.7766,
+      "step": 6800
+    },
+    {
+      "epoch": 0.368,
+      "grad_norm": 147.21160888671875,
+      "learning_rate": 1.4052740740740742e-05,
+      "loss": 4.0305,
+      "step": 6900
+    },
+    {
+      "epoch": 0.37333333333333335,
+      "grad_norm": 86.2465591430664,
+      "learning_rate": 1.3934222222222222e-05,
+      "loss": 4.2835,
+      "step": 7000
+    },
+    {
+      "epoch": 0.37866666666666665,
+      "grad_norm": 112.91142272949219,
+      "learning_rate": 1.3815703703703704e-05,
+      "loss": 3.8102,
+      "step": 7100
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 193.4211883544922,
+      "learning_rate": 1.3698370370370371e-05,
+      "loss": 3.5178,
+      "step": 7200
+    },
+    {
+      "epoch": 0.3893333333333333,
+      "grad_norm": 87.25949096679688,
+      "learning_rate": 1.3579851851851853e-05,
+      "loss": 3.8828,
+      "step": 7300
+    },
+    {
+      "epoch": 0.39466666666666667,
+      "grad_norm": 95.35265350341797,
+      "learning_rate": 1.3461333333333334e-05,
+      "loss": 3.9125,
+      "step": 7400
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 92.84791564941406,
+      "learning_rate": 1.3342814814814814e-05,
+      "loss": 3.8578,
+      "step": 7500
+    },
+    {
+      "epoch": 0.4053333333333333,
+      "grad_norm": 53.305694580078125,
+      "learning_rate": 1.3224296296296298e-05,
+      "loss": 3.7391,
+      "step": 7600
+    },
+    {
+      "epoch": 0.4106666666666667,
+      "grad_norm": 83.85381317138672,
+      "learning_rate": 1.310577777777778e-05,
+      "loss": 3.7178,
+      "step": 7700
+    },
+    {
+      "epoch": 0.416,
+      "grad_norm": 191.92666625976562,
+      "learning_rate": 1.298725925925926e-05,
+      "loss": 3.6572,
+      "step": 7800
+    },
+    {
+      "epoch": 0.42133333333333334,
+      "grad_norm": 152.61099243164062,
+      "learning_rate": 1.2868740740740741e-05,
+      "loss": 3.835,
+      "step": 7900
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 111.54241180419922,
+      "learning_rate": 1.2750222222222223e-05,
+      "loss": 3.4354,
+      "step": 8000
+    },
+    {
+      "epoch": 0.432,
+      "grad_norm": 54.150936126708984,
+      "learning_rate": 1.2631703703703703e-05,
+      "loss": 3.6725,
+      "step": 8100
+    },
+    {
+      "epoch": 0.43733333333333335,
+      "grad_norm": 118.56819915771484,
+      "learning_rate": 1.2513185185185187e-05,
+      "loss": 3.2932,
+      "step": 8200
+    },
+    {
+      "epoch": 0.44266666666666665,
+      "grad_norm": 94.42810821533203,
+      "learning_rate": 1.2394666666666668e-05,
+      "loss": 3.7056,
+      "step": 8300
+    },
+    {
+      "epoch": 0.448,
+      "grad_norm": 98.53581237792969,
+      "learning_rate": 1.2276148148148148e-05,
+      "loss": 3.9801,
+      "step": 8400
+    },
+    {
+      "epoch": 0.4533333333333333,
+      "grad_norm": 80.71143341064453,
+      "learning_rate": 1.215762962962963e-05,
+      "loss": 3.7294,
+      "step": 8500
+    },
+    {
+      "epoch": 0.45866666666666667,
+      "grad_norm": 113.76362609863281,
+      "learning_rate": 1.2039111111111112e-05,
+      "loss": 3.6412,
+      "step": 8600
+    },
+    {
+      "epoch": 0.464,
+      "grad_norm": 56.0068473815918,
+      "learning_rate": 1.1920592592592592e-05,
+      "loss": 3.4301,
+      "step": 8700
+    },
+    {
+      "epoch": 0.4693333333333333,
+      "grad_norm": 118.73760223388672,
+      "learning_rate": 1.1802074074074075e-05,
+      "loss": 3.4932,
+      "step": 8800
+    },
+    {
+      "epoch": 0.4746666666666667,
+      "grad_norm": 110.51318359375,
+      "learning_rate": 1.1683555555555557e-05,
+      "loss": 3.1855,
+      "step": 8900
+    },
+    {
+      "epoch": 0.48,
+      "grad_norm": 165.21546936035156,
+      "learning_rate": 1.1565037037037039e-05,
+      "loss": 3.4505,
+      "step": 9000
+    },
+    {
+      "epoch": 0.48533333333333334,
+      "grad_norm": 72.65802764892578,
+      "learning_rate": 1.1446518518518519e-05,
+      "loss": 3.4431,
+      "step": 9100
+    },
+    {
+      "epoch": 0.49066666666666664,
+      "grad_norm": 99.20442199707031,
+      "learning_rate": 1.1328e-05,
+      "loss": 3.0782,
+      "step": 9200
+    },
+    {
+      "epoch": 0.496,
+      "grad_norm": 60.40226364135742,
+      "learning_rate": 1.1209481481481484e-05,
+      "loss": 3.3604,
+      "step": 9300
+    },
+    {
+      "epoch": 0.5013333333333333,
+      "grad_norm": 61.19272232055664,
+      "learning_rate": 1.1090962962962964e-05,
+      "loss": 3.3833,
+      "step": 9400
+    },
+    {
+      "epoch": 0.5066666666666667,
+      "grad_norm": 35.161251068115234,
+      "learning_rate": 1.0972444444444446e-05,
+      "loss": 3.2887,
+      "step": 9500
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 114.50215148925781,
+      "learning_rate": 1.0853925925925928e-05,
+      "loss": 3.1361,
+      "step": 9600
+    },
+    {
+      "epoch": 0.5173333333333333,
+      "grad_norm": 97.2239761352539,
+      "learning_rate": 1.0735407407407408e-05,
+      "loss": 3.7856,
+      "step": 9700
+    },
+    {
+      "epoch": 0.5226666666666666,
+      "grad_norm": 129.9647979736328,
+      "learning_rate": 1.061688888888889e-05,
+      "loss": 3.4907,
+      "step": 9800
+    },
+    {
+      "epoch": 0.528,
+      "grad_norm": 97.21747589111328,
+      "learning_rate": 1.0498370370370373e-05,
+      "loss": 3.4553,
+      "step": 9900
+    },
+    {
+      "epoch": 0.5333333333333333,
+      "grad_norm": 66.7669448852539,
+      "learning_rate": 1.0379851851851853e-05,
+      "loss": 3.2604,
+      "step": 10000
+    },
+    {
+      "epoch": 0.5386666666666666,
+      "grad_norm": 118.46977996826172,
+      "learning_rate": 1.0261333333333335e-05,
+      "loss": 3.4325,
+      "step": 10100
+    },
+    {
+      "epoch": 0.544,
+      "grad_norm": 151.7686767578125,
+      "learning_rate": 1.0142814814814816e-05,
+      "loss": 3.319,
+      "step": 10200
+    },
+    {
+      "epoch": 0.5493333333333333,
+      "grad_norm": 277.033447265625,
+      "learning_rate": 1.0024296296296296e-05,
+      "loss": 3.3623,
+      "step": 10300
+    },
+    {
+      "epoch": 0.5546666666666666,
+      "grad_norm": 116.09725952148438,
+      "learning_rate": 9.905777777777778e-06,
+      "loss": 3.4278,
+      "step": 10400
+    },
+    {
+      "epoch": 0.56,
+      "grad_norm": 124.16327667236328,
+      "learning_rate": 9.78725925925926e-06,
+      "loss": 3.0365,
+      "step": 10500
+    },
+    {
+      "epoch": 0.5653333333333334,
+      "grad_norm": 32.879947662353516,
+      "learning_rate": 9.668740740740742e-06,
+      "loss": 3.1647,
+      "step": 10600
+    },
+    {
+      "epoch": 0.5706666666666667,
+      "grad_norm": 106.03438568115234,
+      "learning_rate": 9.550222222222223e-06,
+      "loss": 3.387,
+      "step": 10700
+    },
+    {
+      "epoch": 0.576,
+      "grad_norm": 81.7701187133789,
+      "learning_rate": 9.431703703703703e-06,
+      "loss": 3.0888,
+      "step": 10800
+    },
+    {
+      "epoch": 0.5813333333333334,
+      "grad_norm": 92.54151153564453,
+      "learning_rate": 9.313185185185187e-06,
+      "loss": 3.2073,
+      "step": 10900
+    },
+    {
+      "epoch": 0.5866666666666667,
+      "grad_norm": 89.57464599609375,
+      "learning_rate": 9.194666666666667e-06,
+      "loss": 3.0386,
+      "step": 11000
+    },
+    {
+      "epoch": 0.592,
+      "grad_norm": 170.9632568359375,
+      "learning_rate": 9.076148148148149e-06,
+      "loss": 3.222,
+      "step": 11100
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 38.04743194580078,
+      "learning_rate": 8.95762962962963e-06,
+      "loss": 3.1902,
+      "step": 11200
+    },
+    {
+      "epoch": 0.6026666666666667,
+      "grad_norm": 150.6612548828125,
+      "learning_rate": 8.839111111111112e-06,
+      "loss": 3.2242,
+      "step": 11300
+    },
+    {
+      "epoch": 0.608,
+      "grad_norm": 80.37187957763672,
+      "learning_rate": 8.720592592592594e-06,
+      "loss": 2.9589,
+      "step": 11400
+    },
+    {
+      "epoch": 0.6133333333333333,
+      "grad_norm": 97.09183502197266,
+      "learning_rate": 8.60325925925926e-06,
+      "loss": 2.831,
+      "step": 11500
+    },
+    {
+      "epoch": 0.6186666666666667,
+      "grad_norm": 161.1728973388672,
+      "learning_rate": 8.48474074074074e-06,
+      "loss": 3.0551,
+      "step": 11600
+    },
+    {
+      "epoch": 0.624,
+      "grad_norm": 123.0923080444336,
+      "learning_rate": 8.366222222222224e-06,
+      "loss": 2.8091,
+      "step": 11700
+    },
+    {
+      "epoch": 0.6293333333333333,
+      "grad_norm": 122.80669403076172,
+      "learning_rate": 8.247703703703704e-06,
+      "loss": 3.2146,
+      "step": 11800
+    },
+    {
+      "epoch": 0.6346666666666667,
+      "grad_norm": 84.94342041015625,
+      "learning_rate": 8.129185185185186e-06,
+      "loss": 3.1964,
+      "step": 11900
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 142.91336059570312,
+      "learning_rate": 8.010666666666668e-06,
+      "loss": 2.9525,
+      "step": 12000
+    },
+    {
+      "epoch": 0.6453333333333333,
+      "grad_norm": 92.86615753173828,
+      "learning_rate": 7.89214814814815e-06,
+      "loss": 3.2989,
+      "step": 12100
+    },
+    {
+      "epoch": 0.6506666666666666,
+      "grad_norm": 105.99359893798828,
+      "learning_rate": 7.77362962962963e-06,
+      "loss": 2.9683,
+      "step": 12200
+    },
+    {
+      "epoch": 0.656,
+      "grad_norm": 116.56002044677734,
+      "learning_rate": 7.655111111111113e-06,
+      "loss": 2.9026,
+      "step": 12300
+    },
+    {
+      "epoch": 0.6613333333333333,
+      "grad_norm": 102.49303436279297,
+      "learning_rate": 7.536592592592593e-06,
+      "loss": 3.1533,
+      "step": 12400
+    },
+    {
+      "epoch": 0.6666666666666666,
+      "grad_norm": 105.65465545654297,
+      "learning_rate": 7.418074074074074e-06,
+      "loss": 2.7657,
+      "step": 12500
+    },
+    {
+      "epoch": 0.672,
+      "grad_norm": 75.12621307373047,
+      "learning_rate": 7.299555555555556e-06,
+      "loss": 3.09,
+      "step": 12600
+    },
+    {
+      "epoch": 0.6773333333333333,
+      "grad_norm": 79.20515441894531,
+      "learning_rate": 7.181037037037037e-06,
+      "loss": 3.1612,
+      "step": 12700
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 42.84831237792969,
+      "learning_rate": 7.062518518518518e-06,
+      "loss": 2.9614,
+      "step": 12800
+    },
+    {
+      "epoch": 0.688,
+      "grad_norm": 109.40416717529297,
+      "learning_rate": 6.944000000000001e-06,
+      "loss": 3.0533,
+      "step": 12900
+    },
+    {
+      "epoch": 0.6933333333333334,
+      "grad_norm": 119.65579986572266,
+      "learning_rate": 6.825481481481482e-06,
+      "loss": 2.7601,
+      "step": 13000
+    },
+    {
+      "epoch": 0.6986666666666667,
+      "grad_norm": 55.7857551574707,
+      "learning_rate": 6.706962962962964e-06,
+      "loss": 2.9242,
+      "step": 13100
+    },
+    {
+      "epoch": 0.704,
+      "grad_norm": 416.87347412109375,
+      "learning_rate": 6.588444444444445e-06,
+      "loss": 2.5517,
+      "step": 13200
+    },
+    {
+      "epoch": 0.7093333333333334,
+      "grad_norm": 104.21924591064453,
+      "learning_rate": 6.469925925925926e-06,
+      "loss": 2.9859,
+      "step": 13300
+    },
+    {
+      "epoch": 0.7146666666666667,
+      "grad_norm": 95.74305725097656,
+      "learning_rate": 6.351407407407409e-06,
+      "loss": 2.7317,
+      "step": 13400
+    },
+    {
+      "epoch": 0.72,
+      "grad_norm": 97.16332244873047,
+      "learning_rate": 6.2328888888888895e-06,
+      "loss": 2.7578,
+      "step": 13500
+    },
+    {
+      "epoch": 0.7253333333333334,
+      "grad_norm": 85.52286529541016,
+      "learning_rate": 6.11437037037037e-06,
+      "loss": 3.1413,
+      "step": 13600
+    },
+    {
+      "epoch": 0.7306666666666667,
+      "grad_norm": 159.7246551513672,
+      "learning_rate": 5.995851851851853e-06,
+      "loss": 3.0612,
+      "step": 13700
+    },
+    {
+      "epoch": 0.736,
+      "grad_norm": 141.78903198242188,
+      "learning_rate": 5.877333333333334e-06,
+      "loss": 2.8295,
+      "step": 13800
+    },
+    {
+      "epoch": 0.7413333333333333,
+      "grad_norm": 116.21475219726562,
+      "learning_rate": 5.758814814814815e-06,
+      "loss": 2.6263,
+      "step": 13900
+    },
+    {
+      "epoch": 0.7466666666666667,
+      "grad_norm": 91.57818603515625,
+      "learning_rate": 5.640296296296297e-06,
+      "loss": 2.7181,
+      "step": 14000
+    },
+    {
+      "epoch": 0.752,
+      "grad_norm": 68.21562194824219,
+      "learning_rate": 5.521777777777778e-06,
+      "loss": 2.8643,
+      "step": 14100
+    },
+    {
+      "epoch": 0.7573333333333333,
+      "grad_norm": 169.53868103027344,
+      "learning_rate": 5.403259259259259e-06,
+      "loss": 2.903,
+      "step": 14200
+    },
+    {
+      "epoch": 0.7626666666666667,
+      "grad_norm": 218.90374755859375,
+      "learning_rate": 5.284740740740742e-06,
+      "loss": 2.7787,
+      "step": 14300
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 90.267822265625,
+      "learning_rate": 5.166222222222223e-06,
+      "loss": 2.991,
+      "step": 14400
+    },
+    {
+      "epoch": 0.7733333333333333,
+      "grad_norm": 140.58152770996094,
+      "learning_rate": 5.047703703703704e-06,
+      "loss": 2.8306,
+      "step": 14500
+    }
+  ],
+  "logging_steps": 100,
+  "max_steps": 18750,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 128,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-14500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:559c2e10ae7e3eb92c5fe0ec0855e1823bed2527232b2a6421c1e7e5dcf4dd39
+size 5496

checkpoint-14500/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff