willtensora
/

4ada8092-cc1e-445c-9260-a580ef2586ae

@@ -1,12 +1,12 @@
 ---
 library_name: transformers
-license: mit
-base_model: fxmarty/tiny-random-GemmaForCausalLM
 tags:
 - axolotl
 - generated_from_trainer
 model-index:
-- name: fd1980a0-7e71-4e52-addb-318dca5991d5
   results: []
 ---
@@ -18,21 +18,20 @@ should probably proofread and complete it, then remove this comment. -->
 axolotl version: `0.4.1`
 ```yaml
-base_model: fxmarty/tiny-random-GemmaForCausalLM
 batch_size: 32
 bf16: true
 chat_template: tokenizer_default_fallback_alpaca
 datasets:
 - data_files:
-  - b7c2a4a781c93416_train_data.json
   ds_type: json
   format: custom
-  path: /workspace/input_data/b7c2a4a781c93416_train_data.json
   type:
-    field_input: context
-    field_instruction: question
-    field_output: answer
-    format: '{instruction} {input}'
     no_input_format: '{instruction}'
     system_format: '{system}'
     system_prompt: ''
@@ -41,7 +40,7 @@ flash_attention: true
 gpu_memory_limit: 80GiB
 gradient_checkpointing: true
 group_by_length: true
-hub_model_id: willtensora/fd1980a0-7e71-4e52-addb-318dca5991d5
 hub_strategy: checkpoint
 learning_rate: 0.0002
 logging_steps: 10
@@ -57,13 +56,15 @@ sample_packing: false
 save_steps: 40
 save_total_limit: 1
 sequence_len: 2048
-tokenizer_type: GemmaTokenizerFast
 train_on_inputs: false
 trust_remote_code: true
 val_set_size: 0.1
 wandb_entity: ''
 wandb_mode: online
-wandb_name: fxmarty/tiny-random-GemmaForCausalLM-/workspace/input_data/b7c2a4a781c93416_train_data.json
 wandb_project: Gradients-On-Demand
 wandb_run: your_name
 wandb_runid: default
@@ -74,11 +75,11 @@ xformers_attention: true
 </details><br>
-# fd1980a0-7e71-4e52-addb-318dca5991d5
-This model is a fine-tuned version of [fxmarty/tiny-random-GemmaForCausalLM](https://huggingface.co/fxmarty/tiny-random-GemmaForCausalLM) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 11.7971
 ## Model description
@@ -107,21 +108,24 @@ The following hyperparameters were used during training:
 - total_eval_batch_size: 32
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 7
-- training_steps: 156
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| No log        | 0.0008 | 1    | 12.4537         |
-| 12.4357       | 0.0161 | 20   | 12.4267         |
-| 12.392        | 0.0322 | 40   | 12.3762         |
-| 12.3026       | 0.0483 | 60   | 12.2651         |
-| 12.1177       | 0.0645 | 80   | 12.0658         |
-| 11.9286       | 0.0806 | 100  | 11.8860         |
-| 11.8324       | 0.0967 | 120  | 11.8100         |
-| 11.798        | 0.1128 | 140  | 11.7971         |
 ### Framework versions

 ---
 library_name: transformers
+license: apache-2.0
+base_model: JackFram/llama-68m
 tags:
 - axolotl
 - generated_from_trainer
 model-index:
+- name: 4ada8092-cc1e-445c-9260-a580ef2586ae
   results: []
 ---
 axolotl version: `0.4.1`
 ```yaml
+base_model: JackFram/llama-68m
 batch_size: 32
 bf16: true
 chat_template: tokenizer_default_fallback_alpaca
 datasets:
 - data_files:
+  - ff3a521d02fa72b2_train_data.json
   ds_type: json
   format: custom
+  path: /workspace/input_data/ff3a521d02fa72b2_train_data.json
   type:
+    field_instruction: context
+    field_output: question
+    format: '{instruction}'
     no_input_format: '{instruction}'
     system_format: '{system}'
     system_prompt: ''
 gpu_memory_limit: 80GiB
 gradient_checkpointing: true
 group_by_length: true
+hub_model_id: willtensora/4ada8092-cc1e-445c-9260-a580ef2586ae
 hub_strategy: checkpoint
 learning_rate: 0.0002
 logging_steps: 10
 save_steps: 40
 save_total_limit: 1
 sequence_len: 2048
+special_tokens:
+  pad_token: </s>
+tokenizer_type: LlamaTokenizerFast
 train_on_inputs: false
 trust_remote_code: true
 val_set_size: 0.1
 wandb_entity: ''
 wandb_mode: online
+wandb_name: JackFram/llama-68m-/workspace/input_data/ff3a521d02fa72b2_train_data.json
 wandb_project: Gradients-On-Demand
 wandb_run: your_name
 wandb_runid: default
 </details><br>
+# 4ada8092-cc1e-445c-9260-a580ef2586ae
+This model is a fine-tuned version of [JackFram/llama-68m](https://huggingface.co/JackFram/llama-68m) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.2208
 ## Model description
 - total_eval_batch_size: 32
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 10
+- training_steps: 205
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| No log        | 0.0006 | 1    | 6.7193          |
+| 1.5212        | 0.0122 | 20   | 1.0774          |
+| 0.7826        | 0.0244 | 40   | 0.6352          |
+| 0.5492        | 0.0366 | 60   | 0.4713          |
+| 0.3663        | 0.0488 | 80   | 0.3924          |
+| 0.3533        | 0.0610 | 100  | 0.3112          |
+| 0.2434        | 0.0732 | 120  | 0.2761          |
+| 0.2989        | 0.0854 | 140  | 0.2445          |
+| 0.2464        | 0.0976 | 160  | 0.2251          |
+| 0.2233        | 0.1098 | 180  | 0.2203          |
+| 0.2213        | 0.1220 | 200  | 0.2208          |
 ### Framework versions

generation_config.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
   "_from_model_config": true,
-  "bos_token_id": 2,
   "do_sample": true,
-  "eos_token_id": 1,
-  "pad_token_id": 0,
   "transformers_version": "4.46.0"
 }

 {
   "_from_model_config": true,
+  "bos_token_id": 0,
   "do_sample": true,
+  "eos_token_id": 2,
+  "pad_token_id": 1,
   "transformers_version": "4.46.0"
 }

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4bc76ae4e72c9fc13bfe9567ae655234c8d3f2fcf4460d169dedaebd1865dcc9
-size 16392015

 version https://git-lfs.github.com/spec/v1
+oid sha256:432d9ed4d450961d63ceeda6070006f3b7eae9f4bfd1ec6ba4cd115f7bdb6b5a
+size 136067757