Upload folder using huggingface_hub

Browse files

Files changed (16) hide show

checkpoint-7500/1_Pooling/config.json +10 -0
checkpoint-7500/README.md +407 -0
checkpoint-7500/config.json +26 -0
checkpoint-7500/config_sentence_transformers.json +10 -0
checkpoint-7500/model.safetensors +3 -0
checkpoint-7500/modules.json +20 -0
checkpoint-7500/optimizer.pt +3 -0
checkpoint-7500/rng_state.pth +3 -0
checkpoint-7500/scheduler.pt +3 -0
checkpoint-7500/sentence_bert_config.json +4 -0
checkpoint-7500/special_tokens_map.json +37 -0
checkpoint-7500/tokenizer.json +0 -0
checkpoint-7500/tokenizer_config.json +64 -0
checkpoint-7500/trainer_state.json +558 -0
checkpoint-7500/training_args.bin +3 -0
checkpoint-7500/vocab.txt +0 -0

checkpoint-7500/1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 384,
+  "pooling_mode_cls_token": false,
+  "pooling_mode_mean_tokens": true,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

checkpoint-7500/README.md ADDED Viewed

	@@ -0,0 +1,407 @@

+---
+base_model: sentence-transformers/all-MiniLM-L6-v2
+language:
+- en
+library_name: sentence-transformers
+license: apache-2.0
+pipeline_tag: sentence-similarity
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- generated_from_trainer
+- dataset_size:2400000
+- loss:CoSENTLoss
+widget:
+- source_sentence: poolside pants
+  sentences:
+  - safe materials toy
+  - plated necklace
+  - washed cargo pants
+- source_sentence: breathable pants
+  sentences:
+  - extra definition mascara
+  - christmas trees hair clip
+  - milton shorts
+- source_sentence: mozzarella cheese burger
+  sentences:
+  - ankle length leggings
+  - nail polish
+  - olive shacket
+- source_sentence: cookie brownie
+  sentences:
+  - lime top
+  - mdf coffee corner stand
+  - learning flashcards
+- source_sentence: no artificial flavouring food
+  sentences:
+  - eye pencil
+  - tourmaline ceramic brush
+  - rubber dog toy
+---
+# all-MiniLM-L6-v9-pair_score
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
+- **Maximum Sequence Length:** 256 tokens
+- **Output Dimensionality:** 384 tokens
+- **Similarity Function:** Cosine Similarity
+<!-- - **Training Dataset:** Unknown -->
+- **Language:** en
+- **License:** apache-2.0
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
+  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("sentence_transformers_model_id")
+# Run inference
+sentences = [
+    'no artificial flavouring food',
+    'rubber dog toy',
+    'tourmaline ceramic brush',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 384]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities.shape)
+# [3, 3]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 128
+- `per_device_eval_batch_size`: 128
+- `learning_rate`: 2e-05
+- `num_train_epochs`: 1
+- `warmup_ratio`: 0.1
+- `fp16`: True
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 128
+- `per_device_eval_batch_size`: 128
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 2e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1.0
+- `num_train_epochs`: 1
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.1
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 42
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: False
+- `fp16`: True
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: False
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: False
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `dispatch_batches`: None
+- `split_batches`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `eval_use_gather_object`: False
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: proportional
+</details>
+### Training Logs
+| Epoch  | Step | Training Loss |
+|:------:|:----:|:-------------:|
+| 0.0053 | 100  | 13.2077       |
+| 0.0107 | 200  | 12.3835       |
+| 0.016  | 300  | 10.7699       |
+| 0.0213 | 400  | 9.2679        |
+| 0.0267 | 500  | 8.2638        |
+| 0.032  | 600  | 7.69          |
+| 0.0373 | 700  | 7.2751        |
+| 0.0427 | 800  | 6.8786        |
+| 0.048  | 900  | 6.7811        |
+| 0.0533 | 1000 | 6.5834        |
+| 0.0587 | 1100 | 6.3517        |
+| 0.064  | 1200 | 6.2272        |
+| 0.0693 | 1300 | 6.1943        |
+| 0.0747 | 1400 | 6.1038        |
+| 0.08   | 1500 | 6.1216        |
+| 0.0853 | 1600 | 6.1429        |
+| 0.0907 | 1700 | 5.8876        |
+| 0.096  | 1800 | 5.8074        |
+| 0.1013 | 1900 | 5.6261        |
+| 0.1067 | 2000 | 5.838         |
+| 0.112  | 2100 | 5.7161        |
+| 0.1173 | 2200 | 5.5388        |
+| 0.1227 | 2300 | 5.5654        |
+| 0.128  | 2400 | 5.5196        |
+| 0.1333 | 2500 | 5.3665        |
+| 0.1387 | 2600 | 5.2952        |
+| 0.144  | 2700 | 5.4131        |
+| 0.1493 | 2800 | 5.2104        |
+| 0.1547 | 2900 | 5.2176        |
+| 0.16   | 3000 | 4.9406        |
+| 0.1653 | 3100 | 4.8781        |
+| 0.1707 | 3200 | 5.08          |
+| 0.176  | 3300 | 5.1495        |
+| 0.1813 | 3400 | 4.8717        |
+| 0.1867 | 3500 | 4.8196        |
+| 0.192  | 3600 | 4.8065        |
+| 0.1973 | 3700 | 4.718         |
+| 0.2027 | 3800 | 4.7111        |
+| 0.208  | 3900 | 4.6759        |
+| 0.2133 | 4000 | 4.7733        |
+| 0.2187 | 4100 | 4.7041        |
+| 0.224  | 4200 | 4.7898        |
+| 0.2293 | 4300 | 4.8974        |
+| 0.2347 | 4400 | 4.4939        |
+| 0.24   | 4500 | 4.4107        |
+| 0.2453 | 4600 | 4.4831        |
+| 0.2507 | 4700 | 4.4571        |
+| 0.256  | 4800 | 4.1461        |
+| 0.2613 | 4900 | 4.5198        |
+| 0.2667 | 5000 | 4.4998        |
+| 0.272  | 5100 | 4.2135        |
+| 0.2773 | 5200 | 4.441         |
+| 0.2827 | 5300 | 4.2669        |
+| 0.288  | 5400 | 4.0964        |
+| 0.2933 | 5500 | 4.2048        |
+| 0.2987 | 5600 | 4.2123        |
+| 0.304  | 5700 | 4.3391        |
+| 0.3093 | 5800 | 4.3366        |
+| 0.3147 | 5900 | 4.1775        |
+| 0.32   | 6000 | 3.9954        |
+| 0.3253 | 6100 | 4.141         |
+| 0.3307 | 6200 | 4.09          |
+| 0.336  | 6300 | 3.9517        |
+| 0.3413 | 6400 | 3.9844        |
+| 0.3467 | 6500 | 3.8902        |
+| 0.352  | 6600 | 3.571         |
+| 0.3573 | 6700 | 3.7686        |
+| 0.3627 | 6800 | 3.7766        |
+| 0.368  | 6900 | 4.0305        |
+| 0.3733 | 7000 | 4.2835        |
+| 0.3787 | 7100 | 3.8102        |
+| 0.384  | 7200 | 3.5178        |
+| 0.3893 | 7300 | 3.8828        |
+| 0.3947 | 7400 | 3.9125        |
+| 0.4    | 7500 | 3.8578        |
+### Framework Versions
+- Python: 3.8.10
+- Sentence Transformers: 3.1.1
+- Transformers: 4.45.2
+- PyTorch: 2.4.1+cu118
+- Accelerate: 1.0.1
+- Datasets: 3.0.1
+- Tokenizers: 0.20.3
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### CoSENTLoss
+```bibtex
+@online{kexuefm-8847,
+    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
+    author={Su Jianlin},
+    year={2022},
+    month={Jan},
+    url={https://kexue.fm/archives/8847},
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

checkpoint-7500/config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "_name_or_path": "sentence-transformers/all-MiniLM-L6-v2",
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 6,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.45.2",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

checkpoint-7500/config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "__version__": {
+    "sentence_transformers": "3.1.1",
+    "transformers": "4.45.2",
+    "pytorch": "2.4.1+cu118"
+  },
+  "prompts": {},
+  "default_prompt_name": null,
+  "similarity_fn_name": null
+}

checkpoint-7500/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0efce3755d656d5e800e38d14eb024a56a6ea0c825712695946edf101425b73c
+size 90864192

checkpoint-7500/modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

checkpoint-7500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:56d98f95baecf0b6d49f4dec1e286beb8fb323f99da8bbc0e9e2d7fb9e754ae1
+size 180607738

checkpoint-7500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6abe707bdb762d0494cb302076059165a3d785051e7f90b554a91930e5a96613
+size 14244

checkpoint-7500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8072d7e5656a47eb7c6a6ba191b4467030037358ed926b72bd8ac3c74a5ca459
+size 1064

checkpoint-7500/sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 256,
+  "do_lower_case": false
+}

checkpoint-7500/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-7500/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-7500/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,64 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "mask_token": "[MASK]",
+  "max_length": 128,
+  "model_max_length": 256,
+  "never_split": null,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

checkpoint-7500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,558 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.4,
+  "eval_steps": 200000,
+  "global_step": 7500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.005333333333333333,
+      "grad_norm": 87.1787109375,
+      "learning_rate": 1.0346666666666668e-06,
+      "loss": 13.2077,
+      "step": 100
+    },
+    {
+      "epoch": 0.010666666666666666,
+      "grad_norm": 93.43941497802734,
+      "learning_rate": 2.1013333333333335e-06,
+      "loss": 12.3835,
+      "step": 200
+    },
+    {
+      "epoch": 0.016,
+      "grad_norm": 48.778316497802734,
+      "learning_rate": 3.1680000000000004e-06,
+      "loss": 10.7699,
+      "step": 300
+    },
+    {
+      "epoch": 0.021333333333333333,
+      "grad_norm": 58.05220413208008,
+      "learning_rate": 4.234666666666667e-06,
+      "loss": 9.2679,
+      "step": 400
+    },
+    {
+      "epoch": 0.02666666666666667,
+      "grad_norm": 23.75503158569336,
+      "learning_rate": 5.301333333333334e-06,
+      "loss": 8.2638,
+      "step": 500
+    },
+    {
+      "epoch": 0.032,
+      "grad_norm": 15.003210067749023,
+      "learning_rate": 6.368000000000001e-06,
+      "loss": 7.69,
+      "step": 600
+    },
+    {
+      "epoch": 0.037333333333333336,
+      "grad_norm": 17.561220169067383,
+      "learning_rate": 7.434666666666668e-06,
+      "loss": 7.2751,
+      "step": 700
+    },
+    {
+      "epoch": 0.042666666666666665,
+      "grad_norm": 18.077470779418945,
+      "learning_rate": 8.501333333333334e-06,
+      "loss": 6.8786,
+      "step": 800
+    },
+    {
+      "epoch": 0.048,
+      "grad_norm": 23.71312141418457,
+      "learning_rate": 9.568e-06,
+      "loss": 6.7811,
+      "step": 900
+    },
+    {
+      "epoch": 0.05333333333333334,
+      "grad_norm": 23.78058624267578,
+      "learning_rate": 1.0634666666666667e-05,
+      "loss": 6.5834,
+      "step": 1000
+    },
+    {
+      "epoch": 0.058666666666666666,
+      "grad_norm": 21.788728713989258,
+      "learning_rate": 1.1701333333333333e-05,
+      "loss": 6.3517,
+      "step": 1100
+    },
+    {
+      "epoch": 0.064,
+      "grad_norm": 31.29090690612793,
+      "learning_rate": 1.2768e-05,
+      "loss": 6.2272,
+      "step": 1200
+    },
+    {
+      "epoch": 0.06933333333333333,
+      "grad_norm": 40.10956573486328,
+      "learning_rate": 1.3834666666666668e-05,
+      "loss": 6.1943,
+      "step": 1300
+    },
+    {
+      "epoch": 0.07466666666666667,
+      "grad_norm": 64.6901626586914,
+      "learning_rate": 1.4901333333333334e-05,
+      "loss": 6.1038,
+      "step": 1400
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 37.49641036987305,
+      "learning_rate": 1.5968e-05,
+      "loss": 6.1216,
+      "step": 1500
+    },
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 104.68683624267578,
+      "learning_rate": 1.7034666666666668e-05,
+      "loss": 6.1429,
+      "step": 1600
+    },
+    {
+      "epoch": 0.09066666666666667,
+      "grad_norm": 26.25409698486328,
+      "learning_rate": 1.8101333333333336e-05,
+      "loss": 5.8876,
+      "step": 1700
+    },
+    {
+      "epoch": 0.096,
+      "grad_norm": 46.158363342285156,
+      "learning_rate": 1.9168000000000004e-05,
+      "loss": 5.8074,
+      "step": 1800
+    },
+    {
+      "epoch": 0.10133333333333333,
+      "grad_norm": 148.9814910888672,
+      "learning_rate": 1.997511111111111e-05,
+      "loss": 5.6261,
+      "step": 1900
+    },
+    {
+      "epoch": 0.10666666666666667,
+      "grad_norm": 32.8914909362793,
+      "learning_rate": 1.9856592592592595e-05,
+      "loss": 5.838,
+      "step": 2000
+    },
+    {
+      "epoch": 0.112,
+      "grad_norm": 38.54491424560547,
+      "learning_rate": 1.9738074074074077e-05,
+      "loss": 5.7161,
+      "step": 2100
+    },
+    {
+      "epoch": 0.11733333333333333,
+      "grad_norm": 38.57450485229492,
+      "learning_rate": 1.9619555555555555e-05,
+      "loss": 5.5388,
+      "step": 2200
+    },
+    {
+      "epoch": 0.12266666666666666,
+      "grad_norm": 48.14970779418945,
+      "learning_rate": 1.9501037037037037e-05,
+      "loss": 5.5654,
+      "step": 2300
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 51.39189147949219,
+      "learning_rate": 1.9382518518518522e-05,
+      "loss": 5.5196,
+      "step": 2400
+    },
+    {
+      "epoch": 0.13333333333333333,
+      "grad_norm": 29.93403434753418,
+      "learning_rate": 1.9264e-05,
+      "loss": 5.3665,
+      "step": 2500
+    },
+    {
+      "epoch": 0.13866666666666666,
+      "grad_norm": 29.333316802978516,
+      "learning_rate": 1.9145481481481482e-05,
+      "loss": 5.2952,
+      "step": 2600
+    },
+    {
+      "epoch": 0.144,
+      "grad_norm": 119.10240936279297,
+      "learning_rate": 1.9026962962962964e-05,
+      "loss": 5.4131,
+      "step": 2700
+    },
+    {
+      "epoch": 0.14933333333333335,
+      "grad_norm": 60.21550750732422,
+      "learning_rate": 1.8908444444444446e-05,
+      "loss": 5.2104,
+      "step": 2800
+    },
+    {
+      "epoch": 0.15466666666666667,
+      "grad_norm": 49.59823989868164,
+      "learning_rate": 1.8789925925925928e-05,
+      "loss": 5.2176,
+      "step": 2900
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 79.62650299072266,
+      "learning_rate": 1.867140740740741e-05,
+      "loss": 4.9406,
+      "step": 3000
+    },
+    {
+      "epoch": 0.16533333333333333,
+      "grad_norm": 124.49871063232422,
+      "learning_rate": 1.855288888888889e-05,
+      "loss": 4.8781,
+      "step": 3100
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 52.037776947021484,
+      "learning_rate": 1.8435555555555558e-05,
+      "loss": 5.08,
+      "step": 3200
+    },
+    {
+      "epoch": 0.176,
+      "grad_norm": 69.88233184814453,
+      "learning_rate": 1.8317037037037036e-05,
+      "loss": 5.1495,
+      "step": 3300
+    },
+    {
+      "epoch": 0.18133333333333335,
+      "grad_norm": 71.26051330566406,
+      "learning_rate": 1.819851851851852e-05,
+      "loss": 4.8717,
+      "step": 3400
+    },
+    {
+      "epoch": 0.18666666666666668,
+      "grad_norm": 41.996055603027344,
+      "learning_rate": 1.8080000000000003e-05,
+      "loss": 4.8196,
+      "step": 3500
+    },
+    {
+      "epoch": 0.192,
+      "grad_norm": 39.33742141723633,
+      "learning_rate": 1.796148148148148e-05,
+      "loss": 4.8065,
+      "step": 3600
+    },
+    {
+      "epoch": 0.19733333333333333,
+      "grad_norm": 63.21049118041992,
+      "learning_rate": 1.7842962962962963e-05,
+      "loss": 4.718,
+      "step": 3700
+    },
+    {
+      "epoch": 0.20266666666666666,
+      "grad_norm": 54.66056823730469,
+      "learning_rate": 1.7724444444444445e-05,
+      "loss": 4.7111,
+      "step": 3800
+    },
+    {
+      "epoch": 0.208,
+      "grad_norm": 55.63251495361328,
+      "learning_rate": 1.7607111111111112e-05,
+      "loss": 4.6759,
+      "step": 3900
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 78.46200561523438,
+      "learning_rate": 1.7488592592592594e-05,
+      "loss": 4.7733,
+      "step": 4000
+    },
+    {
+      "epoch": 0.21866666666666668,
+      "grad_norm": 67.47509765625,
+      "learning_rate": 1.7370074074074075e-05,
+      "loss": 4.7041,
+      "step": 4100
+    },
+    {
+      "epoch": 0.224,
+      "grad_norm": 69.53897094726562,
+      "learning_rate": 1.7251555555555557e-05,
+      "loss": 4.7898,
+      "step": 4200
+    },
+    {
+      "epoch": 0.22933333333333333,
+      "grad_norm": 103.5851821899414,
+      "learning_rate": 1.713303703703704e-05,
+      "loss": 4.8974,
+      "step": 4300
+    },
+    {
+      "epoch": 0.23466666666666666,
+      "grad_norm": 214.16416931152344,
+      "learning_rate": 1.701451851851852e-05,
+      "loss": 4.4939,
+      "step": 4400
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 53.11494445800781,
+      "learning_rate": 1.6896000000000002e-05,
+      "loss": 4.4107,
+      "step": 4500
+    },
+    {
+      "epoch": 0.24533333333333332,
+      "grad_norm": 69.9258041381836,
+      "learning_rate": 1.6777481481481484e-05,
+      "loss": 4.4831,
+      "step": 4600
+    },
+    {
+      "epoch": 0.25066666666666665,
+      "grad_norm": 98.29943084716797,
+      "learning_rate": 1.6658962962962962e-05,
+      "loss": 4.4571,
+      "step": 4700
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 141.04574584960938,
+      "learning_rate": 1.6540444444444444e-05,
+      "loss": 4.1461,
+      "step": 4800
+    },
+    {
+      "epoch": 0.2613333333333333,
+      "grad_norm": 81.44438171386719,
+      "learning_rate": 1.642192592592593e-05,
+      "loss": 4.5198,
+      "step": 4900
+    },
+    {
+      "epoch": 0.26666666666666666,
+      "grad_norm": 64.71192169189453,
+      "learning_rate": 1.6303407407407408e-05,
+      "loss": 4.4998,
+      "step": 5000
+    },
+    {
+      "epoch": 0.272,
+      "grad_norm": 56.899715423583984,
+      "learning_rate": 1.618488888888889e-05,
+      "loss": 4.2135,
+      "step": 5100
+    },
+    {
+      "epoch": 0.2773333333333333,
+      "grad_norm": 62.43917465209961,
+      "learning_rate": 1.606637037037037e-05,
+      "loss": 4.441,
+      "step": 5200
+    },
+    {
+      "epoch": 0.2826666666666667,
+      "grad_norm": 66.38310241699219,
+      "learning_rate": 1.5947851851851853e-05,
+      "loss": 4.2669,
+      "step": 5300
+    },
+    {
+      "epoch": 0.288,
+      "grad_norm": 269.8346252441406,
+      "learning_rate": 1.5829333333333334e-05,
+      "loss": 4.0964,
+      "step": 5400
+    },
+    {
+      "epoch": 0.29333333333333333,
+      "grad_norm": 65.3775405883789,
+      "learning_rate": 1.5710814814814816e-05,
+      "loss": 4.2048,
+      "step": 5500
+    },
+    {
+      "epoch": 0.2986666666666667,
+      "grad_norm": 72.42823791503906,
+      "learning_rate": 1.5592296296296298e-05,
+      "loss": 4.2123,
+      "step": 5600
+    },
+    {
+      "epoch": 0.304,
+      "grad_norm": 110.38677215576172,
+      "learning_rate": 1.547377777777778e-05,
+      "loss": 4.3391,
+      "step": 5700
+    },
+    {
+      "epoch": 0.30933333333333335,
+      "grad_norm": 74.9642105102539,
+      "learning_rate": 1.535525925925926e-05,
+      "loss": 4.3366,
+      "step": 5800
+    },
+    {
+      "epoch": 0.31466666666666665,
+      "grad_norm": 157.83856201171875,
+      "learning_rate": 1.5236740740740743e-05,
+      "loss": 4.1775,
+      "step": 5900
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 86.05279541015625,
+      "learning_rate": 1.5118222222222223e-05,
+      "loss": 3.9954,
+      "step": 6000
+    },
+    {
+      "epoch": 0.3253333333333333,
+      "grad_norm": 191.12464904785156,
+      "learning_rate": 1.4999703703703705e-05,
+      "loss": 4.141,
+      "step": 6100
+    },
+    {
+      "epoch": 0.33066666666666666,
+      "grad_norm": 77.75936889648438,
+      "learning_rate": 1.4881185185185187e-05,
+      "loss": 4.09,
+      "step": 6200
+    },
+    {
+      "epoch": 0.336,
+      "grad_norm": 78.64347839355469,
+      "learning_rate": 1.4762666666666667e-05,
+      "loss": 3.9517,
+      "step": 6300
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 90.5103988647461,
+      "learning_rate": 1.4645333333333334e-05,
+      "loss": 3.9844,
+      "step": 6400
+    },
+    {
+      "epoch": 0.3466666666666667,
+      "grad_norm": 79.83901977539062,
+      "learning_rate": 1.4526814814814815e-05,
+      "loss": 3.8902,
+      "step": 6500
+    },
+    {
+      "epoch": 0.352,
+      "grad_norm": 87.23860931396484,
+      "learning_rate": 1.4408296296296299e-05,
+      "loss": 3.571,
+      "step": 6600
+    },
+    {
+      "epoch": 0.35733333333333334,
+      "grad_norm": 64.76517486572266,
+      "learning_rate": 1.4289777777777777e-05,
+      "loss": 3.7686,
+      "step": 6700
+    },
+    {
+      "epoch": 0.3626666666666667,
+      "grad_norm": 86.63563537597656,
+      "learning_rate": 1.417125925925926e-05,
+      "loss": 3.7766,
+      "step": 6800
+    },
+    {
+      "epoch": 0.368,
+      "grad_norm": 147.21160888671875,
+      "learning_rate": 1.4052740740740742e-05,
+      "loss": 4.0305,
+      "step": 6900
+    },
+    {
+      "epoch": 0.37333333333333335,
+      "grad_norm": 86.2465591430664,
+      "learning_rate": 1.3934222222222222e-05,
+      "loss": 4.2835,
+      "step": 7000
+    },
+    {
+      "epoch": 0.37866666666666665,
+      "grad_norm": 112.91142272949219,
+      "learning_rate": 1.3815703703703704e-05,
+      "loss": 3.8102,
+      "step": 7100
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 193.4211883544922,
+      "learning_rate": 1.3698370370370371e-05,
+      "loss": 3.5178,
+      "step": 7200
+    },
+    {
+      "epoch": 0.3893333333333333,
+      "grad_norm": 87.25949096679688,
+      "learning_rate": 1.3579851851851853e-05,
+      "loss": 3.8828,
+      "step": 7300
+    },
+    {
+      "epoch": 0.39466666666666667,
+      "grad_norm": 95.35265350341797,
+      "learning_rate": 1.3461333333333334e-05,
+      "loss": 3.9125,
+      "step": 7400
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 92.84791564941406,
+      "learning_rate": 1.3342814814814814e-05,
+      "loss": 3.8578,
+      "step": 7500
+    }
+  ],
+  "logging_steps": 100,
+  "max_steps": 18750,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 128,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-7500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:559c2e10ae7e3eb92c5fe0ec0855e1823bed2527232b2a6421c1e7e5dcf4dd39
+size 5496

checkpoint-7500/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff