kweinmeister
/

gemma-2-27b-it-dolly-15k

@@ -5,8 +5,6 @@ base_model: google/gemma-2-27b-it
 tags:
 - axolotl
 - generated_from_trainer
-datasets:
-- databricks/databricks-dolly-15k
 model-index:
 - name: gemma-2-27b-it-dolly-15k
   results: []
@@ -18,17 +16,11 @@ should probably proofread and complete it, then remove this comment. -->
 [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
 <details><summary>See axolotl config</summary>
-axolotl version: `0.6.0`
 ```yaml
 base_model: google/gemma-2-27b-it
-model_type: AutoModelForCausalLM
-tokenizer_type: AutoTokenizer
 hub_model_id: kweinmeister/gemma-2-27b-it-dolly-15k
-# https://github.com/vllm-project/vllm/issues/10590
-bnb_config_kwargs:
-  bnb_4bit_quant_storage: uint8
 load_in_8bit: false
 load_in_4bit: true
 strict: false
@@ -39,27 +31,33 @@ datasets:
       field_instruction: instruction
       field_input: context
       field_output: response
-val_set_size: 0.1
-output_dir: "/mnt/disks/gcs/axolotl/outputs/dolly-15k-out"
 adapter: qlora
 lora_r: 32
-lora_alpha: 16
 lora_dropout: 0.05
 lora_target_linear: true
-sequence_len: 2048
-sample_packing: true
-eval_sample_packing: false
-pad_to_sequence_len: true
 gradient_accumulation_steps: 4
-micro_batch_size: 1
-num_epochs: 3
 optimizer: adamw_torch
 lr_scheduler: cosine
-learning_rate: 2e-5
 train_on_inputs: false
 group_by_length: false
@@ -68,6 +66,8 @@ fp16:
 tf32: true
 gradient_checkpointing: true
 early_stopping_patience:
 resume_from_checkpoint:
 local_rank:
@@ -75,16 +75,17 @@ logging_steps: 1
 xformers_attention:
 flash_attention: false
-warmup_ratio: 0.1
 evals_per_epoch: 4
-eval_max_new_tokens: 128
 saves_per_epoch: 1
 debug:
-deepspeed: deepspeed_configs/zero1.json
 weight_decay: 0.0
 fsdp:
 fsdp_config:
 ```
@@ -92,9 +93,9 @@ fsdp_config:
 # gemma-2-27b-it-dolly-15k
-This model is a fine-tuned version of [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) on the databricks/databricks-dolly-15k dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.4649
 ## Model description
@@ -113,42 +114,34 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 1
-- eval_batch_size: 1
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 2
 - gradient_accumulation_steps: 4
-- total_train_batch_size: 8
-- total_eval_batch_size: 2
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 46
-- num_epochs: 3
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 4.0853        | 0.0065 | 1    | 2.5485          |
-| 3.4071        | 0.2524 | 39   | 2.1938          |
-| 1.9159        | 0.5049 | 78   | 1.6474          |
-| 1.6968        | 0.7573 | 117  | 1.5546          |
-| 1.7757        | 1.0129 | 156  | 1.5193          |
-| 1.7768        | 1.2654 | 195  | 1.4965          |
-| 1.3735        | 1.5178 | 234  | 1.4835          |
-| 1.7285        | 1.7702 | 273  | 1.4744          |
-| 1.6601        | 2.0259 | 312  | 1.4701          |
-| 1.6477        | 2.2783 | 351  | 1.4657          |
-| 1.3795        | 2.5307 | 390  | 1.4645          |
-| 1.6575        | 2.7832 | 429  | 1.4649          |
 ### Framework versions
-- PEFT 0.14.0
-- Transformers 4.47.1
-- Pytorch 2.3.1+cu121
 - Datasets 3.1.0
-- Tokenizers 0.21.0

 tags:
 - axolotl
 - generated_from_trainer
 model-index:
 - name: gemma-2-27b-it-dolly-15k
   results: []
 [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
 <details><summary>See axolotl config</summary>
+axolotl version: `0.5.2`
 ```yaml
 base_model: google/gemma-2-27b-it
 hub_model_id: kweinmeister/gemma-2-27b-it-dolly-15k
 load_in_8bit: false
 load_in_4bit: true
 strict: false
       field_instruction: instruction
       field_input: context
       field_output: response
+val_set_size: 0.05
+sequence_len: 2048
+sample_packing: true
+eval_sample_packing: false
+pad_to_sequence_len: true
 adapter: qlora
+lora_model_dir:
 lora_r: 32
+lora_alpha: 64
 lora_dropout: 0.05
 lora_target_linear: true
+lora_fan_in_fan_out:
+wandb_project: gemma-2-27b-it-dolly-15k
+wandb_entity:
+wandb_watch:
+wandb_name:
+wandb_log_model:
 gradient_accumulation_steps: 4
+micro_batch_size: 4
+num_epochs: 1
 optimizer: adamw_torch
 lr_scheduler: cosine
+learning_rate: 0.0001
 train_on_inputs: false
 group_by_length: false
 tf32: true
 gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: true
 early_stopping_patience:
 resume_from_checkpoint:
 local_rank:
 xformers_attention:
 flash_attention: false
+warmup_steps: 10
 evals_per_epoch: 4
 saves_per_epoch: 1
 debug:
+deepspeed: deepspeed_configs/zero2.json
 weight_decay: 0.0
 fsdp:
 fsdp_config:
+special_tokens:
+output_dir: "/mnt/disks/gcs/training/runs/google--gemma-2-27b-it-20250101-192231/out/"
+dataset_prepared_path: "/mnt/disks/gcs/training/datasets"
 ```
 # gemma-2-27b-it-dolly-15k
+This model is a fine-tuned version of [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.5560
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 4
+- eval_batch_size: 4
 - seed: 42
 - distributed_type: multi-GPU
 - num_devices: 2
 - gradient_accumulation_steps: 4
+- total_train_batch_size: 32
+- total_eval_batch_size: 8
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 10
+- num_epochs: 1
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 4.2291        | 0.0244 | 1    | 2.1246          |
+| 2.1928        | 0.2683 | 11   | 1.6858          |
+| 1.742         | 0.5366 | 22   | 1.5769          |
+| 1.7213        | 0.8049 | 33   | 1.5560          |
 ### Framework versions
+- PEFT 0.13.2
+- Transformers 4.46.3
+- Pytorch 2.4.1+cu124
 - Datasets 3.1.0
+- Tokenizers 0.20.3

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:661b80aaae193a2bc65f5ebb67429f6c202da3bca1f700c37e0d8c4737584c7c
 size 456822394

 version https://git-lfs.github.com/spec/v1
+oid sha256:34bb1599f6a859b0c63f13428fd8d11df5781227d292a053bffadb108b5fa623
 size 456822394