kweinmeister commited on
Commit
aeaa941
·
verified ·
1 Parent(s): fb940a7

End of training

Browse files
Files changed (2) hide show
  1. README.md +41 -48
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -5,8 +5,6 @@ base_model: google/gemma-2-27b-it
5
  tags:
6
  - axolotl
7
  - generated_from_trainer
8
- datasets:
9
- - databricks/databricks-dolly-15k
10
  model-index:
11
  - name: gemma-2-27b-it-dolly-15k
12
  results: []
@@ -18,17 +16,11 @@ should probably proofread and complete it, then remove this comment. -->
18
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
  <details><summary>See axolotl config</summary>
20
 
21
- axolotl version: `0.6.0`
22
  ```yaml
23
  base_model: google/gemma-2-27b-it
24
- model_type: AutoModelForCausalLM
25
- tokenizer_type: AutoTokenizer
26
  hub_model_id: kweinmeister/gemma-2-27b-it-dolly-15k
27
 
28
- # https://github.com/vllm-project/vllm/issues/10590
29
- bnb_config_kwargs:
30
- bnb_4bit_quant_storage: uint8
31
-
32
  load_in_8bit: false
33
  load_in_4bit: true
34
  strict: false
@@ -39,27 +31,33 @@ datasets:
39
  field_instruction: instruction
40
  field_input: context
41
  field_output: response
 
42
 
43
- val_set_size: 0.1
44
- output_dir: "/mnt/disks/gcs/axolotl/outputs/dolly-15k-out"
 
 
45
 
46
  adapter: qlora
 
47
  lora_r: 32
48
- lora_alpha: 16
49
  lora_dropout: 0.05
50
  lora_target_linear: true
 
51
 
52
- sequence_len: 2048
53
- sample_packing: true
54
- eval_sample_packing: false
55
- pad_to_sequence_len: true
 
56
 
57
  gradient_accumulation_steps: 4
58
- micro_batch_size: 1
59
- num_epochs: 3
60
  optimizer: adamw_torch
61
  lr_scheduler: cosine
62
- learning_rate: 2e-5
63
 
64
  train_on_inputs: false
65
  group_by_length: false
@@ -68,6 +66,8 @@ fp16:
68
  tf32: true
69
 
70
  gradient_checkpointing: true
 
 
71
  early_stopping_patience:
72
  resume_from_checkpoint:
73
  local_rank:
@@ -75,16 +75,17 @@ logging_steps: 1
75
  xformers_attention:
76
  flash_attention: false
77
 
78
- warmup_ratio: 0.1
79
  evals_per_epoch: 4
80
- eval_max_new_tokens: 128
81
  saves_per_epoch: 1
82
  debug:
83
- deepspeed: deepspeed_configs/zero1.json
84
  weight_decay: 0.0
85
-
86
  fsdp:
87
  fsdp_config:
 
 
 
88
 
89
  ```
90
 
@@ -92,9 +93,9 @@ fsdp_config:
92
 
93
  # gemma-2-27b-it-dolly-15k
94
 
95
- This model is a fine-tuned version of [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) on the databricks/databricks-dolly-15k dataset.
96
  It achieves the following results on the evaluation set:
97
- - Loss: 1.4649
98
 
99
  ## Model description
100
 
@@ -113,42 +114,34 @@ More information needed
113
  ### Training hyperparameters
114
 
115
  The following hyperparameters were used during training:
116
- - learning_rate: 2e-05
117
- - train_batch_size: 1
118
- - eval_batch_size: 1
119
  - seed: 42
120
  - distributed_type: multi-GPU
121
  - num_devices: 2
122
  - gradient_accumulation_steps: 4
123
- - total_train_batch_size: 8
124
- - total_eval_batch_size: 2
125
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
126
  - lr_scheduler_type: cosine
127
- - lr_scheduler_warmup_steps: 46
128
- - num_epochs: 3
129
 
130
  ### Training results
131
 
132
  | Training Loss | Epoch | Step | Validation Loss |
133
  |:-------------:|:------:|:----:|:---------------:|
134
- | 4.0853 | 0.0065 | 1 | 2.5485 |
135
- | 3.4071 | 0.2524 | 39 | 2.1938 |
136
- | 1.9159 | 0.5049 | 78 | 1.6474 |
137
- | 1.6968 | 0.7573 | 117 | 1.5546 |
138
- | 1.7757 | 1.0129 | 156 | 1.5193 |
139
- | 1.7768 | 1.2654 | 195 | 1.4965 |
140
- | 1.3735 | 1.5178 | 234 | 1.4835 |
141
- | 1.7285 | 1.7702 | 273 | 1.4744 |
142
- | 1.6601 | 2.0259 | 312 | 1.4701 |
143
- | 1.6477 | 2.2783 | 351 | 1.4657 |
144
- | 1.3795 | 2.5307 | 390 | 1.4645 |
145
- | 1.6575 | 2.7832 | 429 | 1.4649 |
146
 
147
 
148
  ### Framework versions
149
 
150
- - PEFT 0.14.0
151
- - Transformers 4.47.1
152
- - Pytorch 2.3.1+cu121
153
  - Datasets 3.1.0
154
- - Tokenizers 0.21.0
 
5
  tags:
6
  - axolotl
7
  - generated_from_trainer
 
 
8
  model-index:
9
  - name: gemma-2-27b-it-dolly-15k
10
  results: []
 
16
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
  <details><summary>See axolotl config</summary>
18
 
19
+ axolotl version: `0.5.2`
20
  ```yaml
21
  base_model: google/gemma-2-27b-it
 
 
22
  hub_model_id: kweinmeister/gemma-2-27b-it-dolly-15k
23
 
 
 
 
 
24
  load_in_8bit: false
25
  load_in_4bit: true
26
  strict: false
 
31
  field_instruction: instruction
32
  field_input: context
33
  field_output: response
34
+ val_set_size: 0.05
35
 
36
+ sequence_len: 2048
37
+ sample_packing: true
38
+ eval_sample_packing: false
39
+ pad_to_sequence_len: true
40
 
41
  adapter: qlora
42
+ lora_model_dir:
43
  lora_r: 32
44
+ lora_alpha: 64
45
  lora_dropout: 0.05
46
  lora_target_linear: true
47
+ lora_fan_in_fan_out:
48
 
49
+ wandb_project: gemma-2-27b-it-dolly-15k
50
+ wandb_entity:
51
+ wandb_watch:
52
+ wandb_name:
53
+ wandb_log_model:
54
 
55
  gradient_accumulation_steps: 4
56
+ micro_batch_size: 4
57
+ num_epochs: 1
58
  optimizer: adamw_torch
59
  lr_scheduler: cosine
60
+ learning_rate: 0.0001
61
 
62
  train_on_inputs: false
63
  group_by_length: false
 
66
  tf32: true
67
 
68
  gradient_checkpointing: true
69
+ gradient_checkpointing_kwargs:
70
+ use_reentrant: true
71
  early_stopping_patience:
72
  resume_from_checkpoint:
73
  local_rank:
 
75
  xformers_attention:
76
  flash_attention: false
77
 
78
+ warmup_steps: 10
79
  evals_per_epoch: 4
 
80
  saves_per_epoch: 1
81
  debug:
82
+ deepspeed: deepspeed_configs/zero2.json
83
  weight_decay: 0.0
 
84
  fsdp:
85
  fsdp_config:
86
+ special_tokens:
87
+ output_dir: "/mnt/disks/gcs/training/runs/google--gemma-2-27b-it-20250101-192231/out/"
88
+ dataset_prepared_path: "/mnt/disks/gcs/training/datasets"
89
 
90
  ```
91
 
 
93
 
94
  # gemma-2-27b-it-dolly-15k
95
 
96
+ This model is a fine-tuned version of [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) on the None dataset.
97
  It achieves the following results on the evaluation set:
98
+ - Loss: 1.5560
99
 
100
  ## Model description
101
 
 
114
  ### Training hyperparameters
115
 
116
  The following hyperparameters were used during training:
117
+ - learning_rate: 0.0001
118
+ - train_batch_size: 4
119
+ - eval_batch_size: 4
120
  - seed: 42
121
  - distributed_type: multi-GPU
122
  - num_devices: 2
123
  - gradient_accumulation_steps: 4
124
+ - total_train_batch_size: 32
125
+ - total_eval_batch_size: 8
126
  - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
127
  - lr_scheduler_type: cosine
128
+ - lr_scheduler_warmup_steps: 10
129
+ - num_epochs: 1
130
 
131
  ### Training results
132
 
133
  | Training Loss | Epoch | Step | Validation Loss |
134
  |:-------------:|:------:|:----:|:---------------:|
135
+ | 4.2291 | 0.0244 | 1 | 2.1246 |
136
+ | 2.1928 | 0.2683 | 11 | 1.6858 |
137
+ | 1.742 | 0.5366 | 22 | 1.5769 |
138
+ | 1.7213 | 0.8049 | 33 | 1.5560 |
 
 
 
 
 
 
 
 
139
 
140
 
141
  ### Framework versions
142
 
143
+ - PEFT 0.13.2
144
+ - Transformers 4.46.3
145
+ - Pytorch 2.4.1+cu124
146
  - Datasets 3.1.0
147
+ - Tokenizers 0.20.3
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:661b80aaae193a2bc65f5ebb67429f6c202da3bca1f700c37e0d8c4737584c7c
3
  size 456822394
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:34bb1599f6a859b0c63f13428fd8d11df5781227d292a053bffadb108b5fa623
3
  size 456822394