souging commited on
Commit
8cc7baf
·
verified ·
1 Parent(s): 29d42cd

End of training

Browse files
Files changed (2) hide show
  1. README.md +35 -38
  2. adapter_model.bin +2 -2
README.md CHANGED
@@ -1,12 +1,12 @@
1
  ---
2
  library_name: peft
3
- license: gemma
4
- base_model: zake7749/gemma-2-2b-it-chinese-kyara-dpo
5
  tags:
6
  - axolotl
7
  - generated_from_trainer
8
  model-index:
9
- - name: 4aefa0ce-d88c-4eb6-92b3-06407b441084
10
  results: []
11
  ---
12
 
@@ -19,40 +19,39 @@ should probably proofread and complete it, then remove this comment. -->
19
  axolotl version: `0.4.1`
20
  ```yaml
21
  adapter: lora
22
- base_model: zake7749/gemma-2-2b-it-chinese-kyara-dpo
23
  bf16: auto
24
  dataset_prepared_path: null
25
  datasets:
26
  - data_files:
27
- - 8c2dd3c11c63229d_train_data.json
28
  ds_type: json
29
  format: custom
30
- path: /root/G.O.D-test/core/data/8c2dd3c11c63229d_train_data.json
31
  type:
32
- field_instruction: premise
33
- field_output: hypothesis
34
  format: '{instruction}'
35
  no_input_format: '{instruction}'
36
  system_format: '{system}'
37
  system_prompt: ''
38
  debug: null
39
  deepspeed: null
40
- early_stopping_patience: null
41
  eval_max_new_tokens: 128
42
- eval_table_size: null
43
- evals_per_epoch: 0
44
  flash_attention: true
45
  fp16: null
46
  fsdp: null
47
  fsdp_config: null
48
  gradient_accumulation_steps: 4
49
  gradient_checkpointing: false
50
- group_by_length: false
51
- hub_model_id: souging/4aefa0ce-d88c-4eb6-92b3-06407b441084
52
  hub_repo: null
53
  hub_strategy: checkpoint
54
  hub_token: null
55
- learning_rate: 0.0002
56
  load_in_4bit: false
57
  load_in_8bit: false
58
  local_rank: null
@@ -64,44 +63,45 @@ lora_model_dir: null
64
  lora_r: 32
65
  lora_target_linear: true
66
  lr_scheduler: cosine
67
- max_steps: 400
68
- micro_batch_size: 3
69
- mlflow_experiment_name: /tmp/8c2dd3c11c63229d_train_data.json
70
  model_type: AutoModelForCausalLM
71
- num_epochs: 4
72
  optimizer: adamw_bnb_8bit
73
  output_dir: miner_id_24
74
  pad_to_sequence_len: true
75
  resume_from_checkpoint: null
76
  s2_attention: null
77
  sample_packing: false
78
- saves_per_epoch: 0
79
- sequence_len: 2048
 
 
 
 
80
  strict: false
81
  tf32: false
82
  tokenizer_type: AutoTokenizer
83
  train_on_inputs: false
84
  trust_remote_code: true
85
- val_set_size: 0.05
86
  wandb_entity: null
87
  wandb_mode: online
88
- wandb_name: 60492357-69eb-4e2c-a118-9f38faabd1d2
89
  wandb_project: Gradients-On-Demand
90
  wandb_run: your_name
91
- wandb_runid: 60492357-69eb-4e2c-a118-9f38faabd1d2
92
  warmup_steps: 100
93
- weight_decay: 0.01
94
  xformers_attention: null
95
 
96
  ```
97
 
98
  </details><br>
99
 
100
- # 4aefa0ce-d88c-4eb6-92b3-06407b441084
101
 
102
- This model is a fine-tuned version of [zake7749/gemma-2-2b-it-chinese-kyara-dpo](https://huggingface.co/zake7749/gemma-2-2b-it-chinese-kyara-dpo) on the None dataset.
103
- It achieves the following results on the evaluation set:
104
- - Loss: 1.7370
105
 
106
  ## Model description
107
 
@@ -120,25 +120,22 @@ More information needed
120
  ### Training hyperparameters
121
 
122
  The following hyperparameters were used during training:
123
- - learning_rate: 0.0002
124
- - train_batch_size: 3
125
- - eval_batch_size: 3
126
- - seed: 42
127
  - distributed_type: multi-GPU
128
  - num_devices: 8
129
  - gradient_accumulation_steps: 4
130
- - total_train_batch_size: 96
131
- - total_eval_batch_size: 24
132
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
133
  - lr_scheduler_type: cosine
134
  - lr_scheduler_warmup_steps: 100
135
- - training_steps: 400
136
 
137
  ### Training results
138
 
139
- | Training Loss | Epoch | Step | Validation Loss |
140
- |:-------------:|:------:|:----:|:---------------:|
141
- | 1.6473 | 0.2399 | 400 | 1.7370 |
142
 
143
 
144
  ### Framework versions
 
1
  ---
2
  library_name: peft
3
+ license: cc-by-sa-4.0
4
+ base_model: defog/sqlcoder-7b-2
5
  tags:
6
  - axolotl
7
  - generated_from_trainer
8
  model-index:
9
+ - name: eae8f117-788f-4851-8ae0-704076ca26e8
10
  results: []
11
  ---
12
 
 
19
  axolotl version: `0.4.1`
20
  ```yaml
21
  adapter: lora
22
+ base_model: defog/sqlcoder-7b-2
23
  bf16: auto
24
  dataset_prepared_path: null
25
  datasets:
26
  - data_files:
27
+ - 520602e56e559160_train_data.json
28
  ds_type: json
29
  format: custom
30
+ path: /root/G.O.D-test/core/data/520602e56e559160_train_data.json
31
  type:
32
+ field_instruction: prompt
33
+ field_output: gold_standard_solution
34
  format: '{instruction}'
35
  no_input_format: '{instruction}'
36
  system_format: '{system}'
37
  system_prompt: ''
38
  debug: null
39
  deepspeed: null
 
40
  eval_max_new_tokens: 128
41
+ eval_steps: 0
42
+ evals_per_epoch: null
43
  flash_attention: true
44
  fp16: null
45
  fsdp: null
46
  fsdp_config: null
47
  gradient_accumulation_steps: 4
48
  gradient_checkpointing: false
49
+ group_by_length: true
50
+ hub_model_id: souging/eae8f117-788f-4851-8ae0-704076ca26e8
51
  hub_repo: null
52
  hub_strategy: checkpoint
53
  hub_token: null
54
+ learning_rate: 0.000202
55
  load_in_4bit: false
56
  load_in_8bit: false
57
  local_rank: null
 
63
  lora_r: 32
64
  lora_target_linear: true
65
  lr_scheduler: cosine
66
+ max_steps: 500
67
+ micro_batch_size: 2
68
+ mlflow_experiment_name: /tmp/520602e56e559160_train_data.json
69
  model_type: AutoModelForCausalLM
70
+ num_epochs: 10
71
  optimizer: adamw_bnb_8bit
72
  output_dir: miner_id_24
73
  pad_to_sequence_len: true
74
  resume_from_checkpoint: null
75
  s2_attention: null
76
  sample_packing: false
77
+ save_steps: 0
78
+ saves_per_epoch: null
79
+ seed: 20
80
+ sequence_len: 1536
81
+ special_tokens:
82
+ pad_token: </s>
83
  strict: false
84
  tf32: false
85
  tokenizer_type: AutoTokenizer
86
  train_on_inputs: false
87
  trust_remote_code: true
 
88
  wandb_entity: null
89
  wandb_mode: online
90
+ wandb_name: ff2787c9-8dbd-4e47-864f-073c4a88122b
91
  wandb_project: Gradients-On-Demand
92
  wandb_run: your_name
93
+ wandb_runid: ff2787c9-8dbd-4e47-864f-073c4a88122b
94
  warmup_steps: 100
95
+ weight_decay: 0.0
96
  xformers_attention: null
97
 
98
  ```
99
 
100
  </details><br>
101
 
102
+ # eae8f117-788f-4851-8ae0-704076ca26e8
103
 
104
+ This model is a fine-tuned version of [defog/sqlcoder-7b-2](https://huggingface.co/defog/sqlcoder-7b-2) on the None dataset.
 
 
105
 
106
  ## Model description
107
 
 
120
  ### Training hyperparameters
121
 
122
  The following hyperparameters were used during training:
123
+ - learning_rate: 0.000202
124
+ - train_batch_size: 2
125
+ - eval_batch_size: 2
126
+ - seed: 20
127
  - distributed_type: multi-GPU
128
  - num_devices: 8
129
  - gradient_accumulation_steps: 4
130
+ - total_train_batch_size: 64
131
+ - total_eval_batch_size: 16
132
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
133
  - lr_scheduler_type: cosine
134
  - lr_scheduler_warmup_steps: 100
135
+ - training_steps: 500
136
 
137
  ### Training results
138
 
 
 
 
139
 
140
 
141
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:de1a57dda83020dbf728c45d840ceb687d74825f912baee85ac399e0ad3ecad1
3
- size 166265274
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e72e709009678d1415294eb00e7d1c6278129ed98d62dd3870b1e2bd7b17b37
3
+ size 319977674