PEFT
Safetensors
qwen2
axolotl
Generated from Trainer
4-bit precision
bitsandbytes
Files changed (1) hide show
  1. README.md +179 -165
README.md CHANGED
@@ -1,166 +1,180 @@
1
- ---
2
- library_name: peft
3
- license: apache-2.0
4
- base_model: Qwen/Qwen2.5-7B
5
- tags:
6
- - axolotl
7
- - generated_from_trainer
8
- datasets:
9
- - Aratako/Magpie-Tanuki-8B-annotated-96k
10
- - Aratako/Synthetic-JP-EN-Coding-Dataset-Magpie-69k
11
- - DataPilot/Zero_SFT_Ja_v2_b3t4
12
- model-index:
13
- - name: Qwen2.5-7B-axolotl-sft-v0.2
14
- results: []
15
- ---
16
-
17
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
- should probably proofread and complete it, then remove this comment. -->
19
-
20
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
21
- <details><summary>See axolotl config</summary>
22
-
23
- axolotl version: `0.8.0`
24
- ```yaml
25
- base_model: Qwen/Qwen2.5-7B
26
- hub_model_id: OsakanaTeishoku/Qwen2.5-7B-axolotl-sft-v0.2
27
-
28
- load_in_8bit: false
29
- load_in_4bit: true
30
- strict: false
31
-
32
- chat_template: qwen_25
33
-
34
- datasets:
35
- # This will be the path used for the data when it is saved to the Volume in the cloud.
36
- - path: Aratako/Magpie-Tanuki-8B-annotated-96k
37
- split: train
38
- type: chat_template
39
- field_messages: messages
40
- - path: Aratako/Synthetic-JP-EN-Coding-Dataset-Magpie-69k
41
- split: train
42
- type: chat_template
43
- field_messages: messages
44
- - path: DataPilot/Zero_SFT_Ja_v2_b3t4
45
- split: train
46
- type: chat_template
47
- field_messages: conversation
48
- message_property_mappings:
49
- role: from
50
- content: value
51
-
52
- shuffle_merged_datasets: true
53
-
54
- dataset_prepared_path: last_run_prepared
55
- #val_set_size: 0.05
56
- output_dir: ./lora-out
57
-
58
- sequence_len: 2048
59
- sample_packing: false
60
- eval_sample_packing: false
61
- pad_to_sequence_len: false
62
-
63
- adapter: qlora
64
- lora_model_dir:
65
- lora_r: 16
66
- lora_alpha: 32
67
- lora_dropout: 0.05
68
- lora_target_linear: true
69
- lora_fan_in_fan_out:
70
- lora_modules_to_save: # required when adding new tokens to LLaMA/Mistral
71
- - embed_tokens
72
- - lm_head
73
-
74
- wandb_project: modal-axolotl
75
- wandb_name: 20250419-qwen7b-modal
76
-
77
- gradient_accumulation_steps: 4
78
- micro_batch_size: 16
79
- #auto_find_batch_size: true
80
- #num_epochs: 1
81
- optimizer: adamw_bnb_8bit
82
- lr_scheduler: cosine
83
- learning_rate: 0.0001
84
-
85
- bf16: true
86
- fp16: false
87
- tf32: false
88
- train_on_inputs: false
89
- group_by_length: false
90
-
91
- gradient_checkpointing: true
92
- early_stopping_patience:
93
- resume_from_checkpoint:
94
- local_rank:
95
- logging_steps: 1
96
- xformers_attention: true
97
- flash_attention:
98
-
99
- warmup_ratio: 0.05
100
- save_steps: 50
101
- max_steps: 200
102
- debug:
103
- #deepspeed: /workspace/axolotl/deepspeed_configs/zero2.json
104
- weight_decay: 0.0
105
- fsdp:
106
- fsdp_config:
107
- special_tokens:
108
- eos_token: "<|im_end|>"
109
-
110
- plugins:
111
- - axolotl.integrations.liger.LigerPlugin
112
- liger_rope: true
113
- liger_rms_norm: true
114
- liger_glu_activation: true
115
- liger_layer_norm: true
116
- liger_fused_linear_cross_entropy: true
117
-
118
- eval_strategy: "no"
119
- save_strategy: "steps"
120
- ```
121
-
122
- </details><br>
123
-
124
- # Qwen2.5-7B-axolotl-sft-v0.2
125
-
126
- This model is a fine-tuned version of [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) on the Aratako/Magpie-Tanuki-8B-annotated-96k, the Aratako/Synthetic-JP-EN-Coding-Dataset-Magpie-69k and the DataPilot/Zero_SFT_Ja_v2_b3t4 datasets.
127
-
128
- ## Model description
129
-
130
- More information needed
131
-
132
- ## Intended uses & limitations
133
-
134
- More information needed
135
-
136
- ## Training and evaluation data
137
-
138
- More information needed
139
-
140
- ## Training procedure
141
-
142
- ### Training hyperparameters
143
-
144
- The following hyperparameters were used during training:
145
- - learning_rate: 0.0001
146
- - train_batch_size: 16
147
- - eval_batch_size: 16
148
- - seed: 42
149
- - gradient_accumulation_steps: 4
150
- - total_train_batch_size: 64
151
- - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
152
- - lr_scheduler_type: cosine
153
- - lr_scheduler_warmup_steps: 10
154
- - training_steps: 200
155
-
156
- ### Training results
157
-
158
-
159
-
160
- ### Framework versions
161
-
162
- - PEFT 0.15.1
163
- - Transformers 4.51.3
164
- - Pytorch 2.6.0+cu124
165
- - Datasets 3.5.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
166
  - Tokenizers 0.21.1
 
1
+ ---
2
+ library_name: peft
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-7B
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ datasets:
9
+ - Aratako/Magpie-Tanuki-8B-annotated-96k
10
+ - Aratako/Synthetic-JP-EN-Coding-Dataset-Magpie-69k
11
+ - DataPilot/Zero_SFT_Ja_v2_b3t4
12
+ language:
13
+ - zho
14
+ - eng
15
+ - fra
16
+ - spa
17
+ - por
18
+ - deu
19
+ - ita
20
+ - rus
21
+ - jpn
22
+ - kor
23
+ - vie
24
+ - tha
25
+ - ara
26
+ model-index:
27
+ - name: Qwen2.5-7B-axolotl-sft-v0.2
28
+ results: []
29
+ ---
30
+
31
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
32
+ should probably proofread and complete it, then remove this comment. -->
33
+
34
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
35
+ <details><summary>See axolotl config</summary>
36
+
37
+ axolotl version: `0.8.0`
38
+ ```yaml
39
+ base_model: Qwen/Qwen2.5-7B
40
+ hub_model_id: OsakanaTeishoku/Qwen2.5-7B-axolotl-sft-v0.2
41
+
42
+ load_in_8bit: false
43
+ load_in_4bit: true
44
+ strict: false
45
+
46
+ chat_template: qwen_25
47
+
48
+ datasets:
49
+ # This will be the path used for the data when it is saved to the Volume in the cloud.
50
+ - path: Aratako/Magpie-Tanuki-8B-annotated-96k
51
+ split: train
52
+ type: chat_template
53
+ field_messages: messages
54
+ - path: Aratako/Synthetic-JP-EN-Coding-Dataset-Magpie-69k
55
+ split: train
56
+ type: chat_template
57
+ field_messages: messages
58
+ - path: DataPilot/Zero_SFT_Ja_v2_b3t4
59
+ split: train
60
+ type: chat_template
61
+ field_messages: conversation
62
+ message_property_mappings:
63
+ role: from
64
+ content: value
65
+
66
+ shuffle_merged_datasets: true
67
+
68
+ dataset_prepared_path: last_run_prepared
69
+ #val_set_size: 0.05
70
+ output_dir: ./lora-out
71
+
72
+ sequence_len: 2048
73
+ sample_packing: false
74
+ eval_sample_packing: false
75
+ pad_to_sequence_len: false
76
+
77
+ adapter: qlora
78
+ lora_model_dir:
79
+ lora_r: 16
80
+ lora_alpha: 32
81
+ lora_dropout: 0.05
82
+ lora_target_linear: true
83
+ lora_fan_in_fan_out:
84
+ lora_modules_to_save: # required when adding new tokens to LLaMA/Mistral
85
+ - embed_tokens
86
+ - lm_head
87
+
88
+ wandb_project: modal-axolotl
89
+ wandb_name: 20250419-qwen7b-modal
90
+
91
+ gradient_accumulation_steps: 4
92
+ micro_batch_size: 16
93
+ #auto_find_batch_size: true
94
+ #num_epochs: 1
95
+ optimizer: adamw_bnb_8bit
96
+ lr_scheduler: cosine
97
+ learning_rate: 0.0001
98
+
99
+ bf16: true
100
+ fp16: false
101
+ tf32: false
102
+ train_on_inputs: false
103
+ group_by_length: false
104
+
105
+ gradient_checkpointing: true
106
+ early_stopping_patience:
107
+ resume_from_checkpoint:
108
+ local_rank:
109
+ logging_steps: 1
110
+ xformers_attention: true
111
+ flash_attention:
112
+
113
+ warmup_ratio: 0.05
114
+ save_steps: 50
115
+ max_steps: 200
116
+ debug:
117
+ #deepspeed: /workspace/axolotl/deepspeed_configs/zero2.json
118
+ weight_decay: 0.0
119
+ fsdp:
120
+ fsdp_config:
121
+ special_tokens:
122
+ eos_token: "<|im_end|>"
123
+
124
+ plugins:
125
+ - axolotl.integrations.liger.LigerPlugin
126
+ liger_rope: true
127
+ liger_rms_norm: true
128
+ liger_glu_activation: true
129
+ liger_layer_norm: true
130
+ liger_fused_linear_cross_entropy: true
131
+
132
+ eval_strategy: "no"
133
+ save_strategy: "steps"
134
+ ```
135
+
136
+ </details><br>
137
+
138
+ # Qwen2.5-7B-axolotl-sft-v0.2
139
+
140
+ This model is a fine-tuned version of [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) on the Aratako/Magpie-Tanuki-8B-annotated-96k, the Aratako/Synthetic-JP-EN-Coding-Dataset-Magpie-69k and the DataPilot/Zero_SFT_Ja_v2_b3t4 datasets.
141
+
142
+ ## Model description
143
+
144
+ More information needed
145
+
146
+ ## Intended uses & limitations
147
+
148
+ More information needed
149
+
150
+ ## Training and evaluation data
151
+
152
+ More information needed
153
+
154
+ ## Training procedure
155
+
156
+ ### Training hyperparameters
157
+
158
+ The following hyperparameters were used during training:
159
+ - learning_rate: 0.0001
160
+ - train_batch_size: 16
161
+ - eval_batch_size: 16
162
+ - seed: 42
163
+ - gradient_accumulation_steps: 4
164
+ - total_train_batch_size: 64
165
+ - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
166
+ - lr_scheduler_type: cosine
167
+ - lr_scheduler_warmup_steps: 10
168
+ - training_steps: 200
169
+
170
+ ### Training results
171
+
172
+
173
+
174
+ ### Framework versions
175
+
176
+ - PEFT 0.15.1
177
+ - Transformers 4.51.3
178
+ - Pytorch 2.6.0+cu124
179
+ - Datasets 3.5.0
180
  - Tokenizers 0.21.1