lihaoxin2020
/

sheared_llama_1.3b-reazon_v2-ja_en_trans-T2T

+---
+license: apache-2.0
+base_model: princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT
+tags:
+- trl
+- sft
+- generated_from_trainer
+model-index:
+- name: sheared_llama_1.3b-reazon_v2-ja_en_trans-T2T
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# sheared_llama_1.3b-reazon_v2-ja_en_trans-T2T
+This model is a fine-tuned version of [princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B-ShareGPT) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.3844
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 32
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 1024
+- total_eval_batch_size: 64
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.01
+- training_steps: 200
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 2.0409        | 0.0114 | 10   | 1.7202          |
+| 1.6901        | 0.0229 | 20   | 1.6101          |
+| 1.5859        | 0.0343 | 30   | 1.5446          |
+| 1.5533        | 0.0458 | 40   | 1.5029          |
+| 1.4937        | 0.0572 | 50   | 1.4722          |
+| 1.4802        | 0.0687 | 60   | 1.4492          |
+| 1.4484        | 0.0801 | 70   | 1.4302          |
+| 1.4292        | 0.0916 | 80   | 1.4183          |
+| 1.4203        | 0.1030 | 90   | 1.4078          |
+| 1.4184        | 0.1145 | 100  | 1.3985          |
+| 1.4045        | 0.1259 | 110  | 1.3923          |
+| 1.4125        | 0.1374 | 120  | 1.3886          |
+| 1.4098        | 0.1488 | 130  | 1.3877          |
+| 1.3921        | 0.1603 | 140  | 1.3859          |
+| 1.3984        | 0.1717 | 150  | 1.3851          |
+| 1.3858        | 0.1832 | 160  | 1.3845          |
+| 1.3995        | 0.1946 | 170  | 1.3842          |
+| 1.3943        | 0.2061 | 180  | 1.3847          |
+| 1.3988        | 0.2175 | 190  | 1.3844          |
+| 1.3969        | 0.2290 | 200  | 1.3844          |
+### Framework versions
+- Transformers 4.41.2
+- Pytorch 2.3.0+cu121
+- Datasets 2.20.0
+- Tokenizers 0.19.1

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 0,
+  "transformers_version": "4.41.2"
+}

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ab9c16d5d526dbb797c4e2ef72a7ab0d2e71c88569b7e636030c5de36af9ffd9
 size 2690880168

 version https://git-lfs.github.com/spec/v1
+oid sha256:fa7967fc997ccdd2a8f69a8290817293b6f315d2ab17dc04b8a77448978a0630
 size 2690880168