lbourdois commited on
Commit
62c3e01
·
verified ·
1 Parent(s): 9df44f7

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +171 -157
README.md CHANGED
@@ -1,158 +1,172 @@
1
- ---
2
- library_name: transformers
3
- license: other
4
- license_name: qwen
5
- license_link: https://huggingface.co/Qwen/Qwen2.5-14B/blob/main/LICENSE
6
- base_model: Qwen/Qwen2.5-14B
7
- tags:
8
- - generated_from_trainer
9
- model-index:
10
- - name: 14B-Qwen2.5-Freya-x1
11
- results: []
12
- ---
13
-
14
- ![Freya](https://huggingface.co/Sao10K/14B-Qwen2.5-Freya-x1/resolve/main/sad.png)
15
- *Me during failed runs*
16
-
17
- # 14B-Qwen2.5-Freya-v1
18
-
19
- I decided to mess around with training methods again, considering the re-emegence of methods like multi-step training. Some people began doing it again, and so, why not? Inspired by AshhLimaRP's methology but done it my way.
20
-
21
- Freya-S1
22
- - LoRA Trained on ~1.1GB of literature and raw text over Qwen 2.5's base model.
23
- - Cleaned text and literature as best as I could, still, may have had issues here and there.
24
-
25
- Freya-S2
26
- - The first LoRA was applied over Qwen 2.5 Instruct, then I trained on top of that.
27
- - Reduced LoRA rank because it's mainly instruct and other details I won't get into.
28
-
29
- Recommended Model Settings | *Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.*
30
- ```
31
- Prompt Format: ChatML
32
- Temperature: 1+ # I don't know, man.
33
- min_p: 0.05
34
- ```
35
-
36
- Training time in total was ~10 Hours on a 8xH100 Node, sponsored by the Government of Singapore or something. Thanks for the national service allowance, MHA.
37
-
38
- https://sao10k.carrd.co/ for contact.
39
-
40
- ---
41
-
42
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
43
- <details><summary>See axolotl config</summary>
44
-
45
- axolotl version: `0.6.0`
46
- ```yaml
47
- base_model:
48
- - s1: Qwen/Qwen2.5-14B
49
- - s2: Qwen/Qwen2.5-14B-Instruct
50
- model_type: AutoModelForCausalLM
51
- tokenizer_type: AutoTokenizer
52
-
53
- load_in_8bit: false
54
- load_in_4bit: false
55
- strict: false
56
- sequence_len: 16384
57
- bf16: auto
58
- fp16:
59
- tf32: false
60
- flash_attention: true
61
- special_tokens:
62
-
63
- adapter: lora # 16-bit
64
- lora_r:
65
- - s1: 64
66
- - s2: 32
67
- lora_alpha: 64
68
- lora_dropout: 0.2
69
- lora_fan_in_fan_out:
70
- peft_use_rslora: true
71
- lora_target_linear: true
72
-
73
- # Data
74
- dataset_prepared_path: dataset_run_freya
75
- datasets:
76
- # S1 - Writing / Completion
77
- - path: datasets/eBooks-cleaned-75K
78
- type: completion
79
- - path: datasets/novels-clean-dedupe-10K
80
- type: completion
81
- # S2 - Instruct
82
- - path: datasets/10k-amoral-full-fixed-sys.json
83
- type: chat_template
84
- chat_template: chatml
85
- roles_to_train: ["gpt"]
86
- field_messages: conversations
87
- message_field_role: from
88
- message_field_content: value
89
- train_on_eos: turn
90
- - path: datasets/44k-hespera-smartshuffle.json
91
- type: chat_template
92
- chat_template: chatml
93
- roles_to_train: ["gpt"]
94
- field_messages: conversations
95
- message_field_role: from
96
- message_field_content: value
97
- train_on_eos: turn
98
- - path: datasets/5k_rpg_adventure_instruct-sys.json
99
- type: chat_template
100
- chat_template: chatml
101
- roles_to_train: ["gpt"]
102
- field_messages: conversations
103
- message_field_role: from
104
- message_field_content: value
105
- train_on_eos: turn
106
- shuffle_merged_datasets: true
107
- warmup_ratio: 0.1
108
-
109
- plugins:
110
- - axolotl.integrations.liger.LigerPlugin
111
- liger_rope: true
112
- liger_rms_norm: true
113
- liger_layer_norm: true
114
- liger_glu_activation: true
115
- liger_fused_linear_cross_entropy: true
116
-
117
- # Iterations
118
- num_epochs:
119
- - s1: 1
120
- - s2: 2
121
-
122
- # Sampling
123
- sample_packing: true
124
- pad_to_sequence_len: true
125
- train_on_inputs: false
126
- group_by_length: false
127
-
128
- # Batching
129
- gradient_accumulation_steps: 4
130
- micro_batch_size: 2
131
- gradient_checkpointing: unsloth
132
-
133
- # Evaluation
134
- val_set_size: 0.025
135
- evals_per_epoch: 5
136
- eval_table_size:
137
- eval_max_new_tokens: 256
138
- eval_sample_packing: false
139
- eval_batch_size: 1
140
-
141
- # Optimizer
142
- optimizer: paged_ademamix_8bit
143
- lr_scheduler: cosine
144
- learning_rate:
145
- - s1: 0.000002
146
- - s2: 0.000004
147
- weight_decay: 0.2
148
- max_grad_norm: 10.0
149
-
150
- # Garbage Collection
151
- gc_steps: 10
152
-
153
- # Misc
154
- deepspeed: ./deepspeed_configs/zero2.json
155
-
156
- ```
157
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158
  </details><br>
 
1
+ ---
2
+ library_name: transformers
3
+ license: other
4
+ license_name: qwen
5
+ license_link: https://huggingface.co/Qwen/Qwen2.5-14B/blob/main/LICENSE
6
+ base_model: Qwen/Qwen2.5-14B
7
+ tags:
8
+ - generated_from_trainer
9
+ language:
10
+ - zho
11
+ - eng
12
+ - fra
13
+ - spa
14
+ - por
15
+ - deu
16
+ - ita
17
+ - rus
18
+ - jpn
19
+ - kor
20
+ - vie
21
+ - tha
22
+ - ara
23
+ model-index:
24
+ - name: 14B-Qwen2.5-Freya-x1
25
+ results: []
26
+ ---
27
+
28
+ ![Freya](https://huggingface.co/Sao10K/14B-Qwen2.5-Freya-x1/resolve/main/sad.png)
29
+ *Me during failed runs*
30
+
31
+ # 14B-Qwen2.5-Freya-v1
32
+
33
+ I decided to mess around with training methods again, considering the re-emegence of methods like multi-step training. Some people began doing it again, and so, why not? Inspired by AshhLimaRP's methology but done it my way.
34
+
35
+ Freya-S1
36
+ - LoRA Trained on ~1.1GB of literature and raw text over Qwen 2.5's base model.
37
+ - Cleaned text and literature as best as I could, still, may have had issues here and there.
38
+
39
+ Freya-S2
40
+ - The first LoRA was applied over Qwen 2.5 Instruct, then I trained on top of that.
41
+ - Reduced LoRA rank because it's mainly instruct and other details I won't get into.
42
+
43
+ Recommended Model Settings | *Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.*
44
+ ```
45
+ Prompt Format: ChatML
46
+ Temperature: 1+ # I don't know, man.
47
+ min_p: 0.05
48
+ ```
49
+
50
+ Training time in total was ~10 Hours on a 8xH100 Node, sponsored by the Government of Singapore or something. Thanks for the national service allowance, MHA.
51
+
52
+ https://sao10k.carrd.co/ for contact.
53
+
54
+ ---
55
+
56
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
57
+ <details><summary>See axolotl config</summary>
58
+
59
+ axolotl version: `0.6.0`
60
+ ```yaml
61
+ base_model:
62
+ - s1: Qwen/Qwen2.5-14B
63
+ - s2: Qwen/Qwen2.5-14B-Instruct
64
+ model_type: AutoModelForCausalLM
65
+ tokenizer_type: AutoTokenizer
66
+
67
+ load_in_8bit: false
68
+ load_in_4bit: false
69
+ strict: false
70
+ sequence_len: 16384
71
+ bf16: auto
72
+ fp16:
73
+ tf32: false
74
+ flash_attention: true
75
+ special_tokens:
76
+
77
+ adapter: lora # 16-bit
78
+ lora_r:
79
+ - s1: 64
80
+ - s2: 32
81
+ lora_alpha: 64
82
+ lora_dropout: 0.2
83
+ lora_fan_in_fan_out:
84
+ peft_use_rslora: true
85
+ lora_target_linear: true
86
+
87
+ # Data
88
+ dataset_prepared_path: dataset_run_freya
89
+ datasets:
90
+ # S1 - Writing / Completion
91
+ - path: datasets/eBooks-cleaned-75K
92
+ type: completion
93
+ - path: datasets/novels-clean-dedupe-10K
94
+ type: completion
95
+ # S2 - Instruct
96
+ - path: datasets/10k-amoral-full-fixed-sys.json
97
+ type: chat_template
98
+ chat_template: chatml
99
+ roles_to_train: ["gpt"]
100
+ field_messages: conversations
101
+ message_field_role: from
102
+ message_field_content: value
103
+ train_on_eos: turn
104
+ - path: datasets/44k-hespera-smartshuffle.json
105
+ type: chat_template
106
+ chat_template: chatml
107
+ roles_to_train: ["gpt"]
108
+ field_messages: conversations
109
+ message_field_role: from
110
+ message_field_content: value
111
+ train_on_eos: turn
112
+ - path: datasets/5k_rpg_adventure_instruct-sys.json
113
+ type: chat_template
114
+ chat_template: chatml
115
+ roles_to_train: ["gpt"]
116
+ field_messages: conversations
117
+ message_field_role: from
118
+ message_field_content: value
119
+ train_on_eos: turn
120
+ shuffle_merged_datasets: true
121
+ warmup_ratio: 0.1
122
+
123
+ plugins:
124
+ - axolotl.integrations.liger.LigerPlugin
125
+ liger_rope: true
126
+ liger_rms_norm: true
127
+ liger_layer_norm: true
128
+ liger_glu_activation: true
129
+ liger_fused_linear_cross_entropy: true
130
+
131
+ # Iterations
132
+ num_epochs:
133
+ - s1: 1
134
+ - s2: 2
135
+
136
+ # Sampling
137
+ sample_packing: true
138
+ pad_to_sequence_len: true
139
+ train_on_inputs: false
140
+ group_by_length: false
141
+
142
+ # Batching
143
+ gradient_accumulation_steps: 4
144
+ micro_batch_size: 2
145
+ gradient_checkpointing: unsloth
146
+
147
+ # Evaluation
148
+ val_set_size: 0.025
149
+ evals_per_epoch: 5
150
+ eval_table_size:
151
+ eval_max_new_tokens: 256
152
+ eval_sample_packing: false
153
+ eval_batch_size: 1
154
+
155
+ # Optimizer
156
+ optimizer: paged_ademamix_8bit
157
+ lr_scheduler: cosine
158
+ learning_rate:
159
+ - s1: 0.000002
160
+ - s2: 0.000004
161
+ weight_decay: 0.2
162
+ max_grad_norm: 10.0
163
+
164
+ # Garbage Collection
165
+ gc_steps: 10
166
+
167
+ # Misc
168
+ deepspeed: ./deepspeed_configs/zero2.json
169
+
170
+ ```
171
+
172
  </details><br>