lbourdois commited on
Commit
49048e1
·
verified ·
1 Parent(s): 9c9d650

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +168 -154
README.md CHANGED
@@ -1,155 +1,169 @@
1
- ---
2
- library_name: peft
3
- license: apache-2.0
4
- base_model: Qwen/Qwen2.5-32B-Instruct
5
- tags:
6
- - generated_from_trainer
7
- datasets:
8
- - Fizzarolli/inkmix-v2
9
- model-index:
10
- - name: ckpts
11
- results: []
12
- ---
13
-
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
18
- <details><summary>See axolotl config</summary>
19
-
20
- axolotl version: `0.6.0`
21
- ```yaml
22
- base_model: Qwen/Qwen2.5-32B-Instruct
23
-
24
- load_in_8bit: true
25
- load_in_4bit: false
26
-
27
- plugins:
28
- - axolotl.integrations.liger.LigerPlugin
29
- liger_rope: true
30
- liger_rms_norm: true
31
- liger_glu_activation: true
32
- liger_fused_linear_cross_entropy: true
33
-
34
- #unsloth_lora_mlp: true
35
- #unsloth_lora_qkv: true
36
- #unsloth_lora_o: true
37
-
38
- strict: false
39
-
40
- adapter: lora
41
- lora_r: 16
42
- lora_alpha: 32
43
- lora_dropout: 0.25
44
- lora_target_linear: true
45
- peft_layers_to_transform:
46
- loraplus_lr_ratio: 16
47
-
48
- chat_template: chatml
49
- datasets:
50
- - path: Fizzarolli/inkmix-v2
51
- type: chat_template
52
- chat_template: tokenizer_default
53
- split: train
54
- field_messages: conversations
55
- message_field_role: from
56
- message_field_content: value
57
-
58
- dataset_prepared_path: last_run_prepared
59
- #val_set_size: 0.02
60
- output_dir: ./ckpts
61
-
62
- sequence_len: 8192
63
- sample_packing: true
64
- pad_to_sequence_len: true
65
-
66
- #wandb_project: teleut-7b-rp
67
- #wandb_entity:
68
- #wandb_watch:
69
- #wandb_name:
70
- #wandb_log_model: checkpoint
71
-
72
- # mlflow configuration if you're using it
73
- mlflow_tracking_uri: https://public-tracking.mlflow-e00zzfjq11ky6jcgtv.backbone-e00bgn6e63256prmhq.msp.eu-north1.nebius.cloud
74
- mlflow_experiment_name: tq-32b-rp-inkmixv2
75
- mlflow_run_name: v1
76
- hf_mlflow_log_artifacts: true
77
-
78
- gradient_accumulation_steps: 2
79
- micro_batch_size: 8
80
- num_epochs: 2
81
- optimizer: paged_adamw_8bit
82
- lr_scheduler: cosine
83
- learning_rate: 6e-5
84
-
85
- train_on_inputs: false
86
- group_by_length: false
87
- bf16: auto
88
- fp16:
89
- tf32: false
90
-
91
- gradient_checkpointing: unsloth
92
- gradient_checkpointing_kwargs:
93
- use_reentrant: false
94
- early_stopping_patience:
95
- resume_from_checkpoint:
96
- logging_steps: 1
97
- xformers_attention:
98
- flash_attention: true
99
-
100
- #deepspeed: deepspeed_configs/zero3_bf16.json
101
-
102
- warmup_steps: 25
103
- #evals_per_epoch: 4
104
- eval_table_size:
105
- saves_per_epoch: 10
106
- debug:
107
- weight_decay: 0.05
108
-
109
- ```
110
-
111
- </details><br>
112
-
113
- # ckpts
114
-
115
- This model is a fine-tuned version of [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) on the Fizzarolli/inkmix-v2 dataset.
116
-
117
- ## Model description
118
-
119
- More information needed
120
-
121
- ## Intended uses & limitations
122
-
123
- More information needed
124
-
125
- ## Training and evaluation data
126
-
127
- More information needed
128
-
129
- ## Training procedure
130
-
131
- ### Training hyperparameters
132
-
133
- The following hyperparameters were used during training:
134
- - learning_rate: 6e-05
135
- - train_batch_size: 8
136
- - eval_batch_size: 8
137
- - seed: 42
138
- - gradient_accumulation_steps: 2
139
- - total_train_batch_size: 16
140
- - optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
141
- - lr_scheduler_type: cosine
142
- - lr_scheduler_warmup_steps: 25
143
- - num_epochs: 2
144
-
145
- ### Training results
146
-
147
-
148
-
149
- ### Framework versions
150
-
151
- - PEFT 0.14.0
152
- - Transformers 4.47.1
153
- - Pytorch 2.5.1+cu124
154
- - Datasets 3.1.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155
  - Tokenizers 0.21.0
 
1
+ ---
2
+ library_name: peft
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-32B-Instruct
5
+ tags:
6
+ - generated_from_trainer
7
+ datasets:
8
+ - Fizzarolli/inkmix-v2
9
+ language:
10
+ - zho
11
+ - eng
12
+ - fra
13
+ - spa
14
+ - por
15
+ - deu
16
+ - ita
17
+ - rus
18
+ - jpn
19
+ - kor
20
+ - vie
21
+ - tha
22
+ - ara
23
+ model-index:
24
+ - name: ckpts
25
+ results: []
26
+ ---
27
+
28
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
+ should probably proofread and complete it, then remove this comment. -->
30
+
31
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
32
+ <details><summary>See axolotl config</summary>
33
+
34
+ axolotl version: `0.6.0`
35
+ ```yaml
36
+ base_model: Qwen/Qwen2.5-32B-Instruct
37
+
38
+ load_in_8bit: true
39
+ load_in_4bit: false
40
+
41
+ plugins:
42
+ - axolotl.integrations.liger.LigerPlugin
43
+ liger_rope: true
44
+ liger_rms_norm: true
45
+ liger_glu_activation: true
46
+ liger_fused_linear_cross_entropy: true
47
+
48
+ #unsloth_lora_mlp: true
49
+ #unsloth_lora_qkv: true
50
+ #unsloth_lora_o: true
51
+
52
+ strict: false
53
+
54
+ adapter: lora
55
+ lora_r: 16
56
+ lora_alpha: 32
57
+ lora_dropout: 0.25
58
+ lora_target_linear: true
59
+ peft_layers_to_transform:
60
+ loraplus_lr_ratio: 16
61
+
62
+ chat_template: chatml
63
+ datasets:
64
+ - path: Fizzarolli/inkmix-v2
65
+ type: chat_template
66
+ chat_template: tokenizer_default
67
+ split: train
68
+ field_messages: conversations
69
+ message_field_role: from
70
+ message_field_content: value
71
+
72
+ dataset_prepared_path: last_run_prepared
73
+ #val_set_size: 0.02
74
+ output_dir: ./ckpts
75
+
76
+ sequence_len: 8192
77
+ sample_packing: true
78
+ pad_to_sequence_len: true
79
+
80
+ #wandb_project: teleut-7b-rp
81
+ #wandb_entity:
82
+ #wandb_watch:
83
+ #wandb_name:
84
+ #wandb_log_model: checkpoint
85
+
86
+ # mlflow configuration if you're using it
87
+ mlflow_tracking_uri: https://public-tracking.mlflow-e00zzfjq11ky6jcgtv.backbone-e00bgn6e63256prmhq.msp.eu-north1.nebius.cloud
88
+ mlflow_experiment_name: tq-32b-rp-inkmixv2
89
+ mlflow_run_name: v1
90
+ hf_mlflow_log_artifacts: true
91
+
92
+ gradient_accumulation_steps: 2
93
+ micro_batch_size: 8
94
+ num_epochs: 2
95
+ optimizer: paged_adamw_8bit
96
+ lr_scheduler: cosine
97
+ learning_rate: 6e-5
98
+
99
+ train_on_inputs: false
100
+ group_by_length: false
101
+ bf16: auto
102
+ fp16:
103
+ tf32: false
104
+
105
+ gradient_checkpointing: unsloth
106
+ gradient_checkpointing_kwargs:
107
+ use_reentrant: false
108
+ early_stopping_patience:
109
+ resume_from_checkpoint:
110
+ logging_steps: 1
111
+ xformers_attention:
112
+ flash_attention: true
113
+
114
+ #deepspeed: deepspeed_configs/zero3_bf16.json
115
+
116
+ warmup_steps: 25
117
+ #evals_per_epoch: 4
118
+ eval_table_size:
119
+ saves_per_epoch: 10
120
+ debug:
121
+ weight_decay: 0.05
122
+
123
+ ```
124
+
125
+ </details><br>
126
+
127
+ # ckpts
128
+
129
+ This model is a fine-tuned version of [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) on the Fizzarolli/inkmix-v2 dataset.
130
+
131
+ ## Model description
132
+
133
+ More information needed
134
+
135
+ ## Intended uses & limitations
136
+
137
+ More information needed
138
+
139
+ ## Training and evaluation data
140
+
141
+ More information needed
142
+
143
+ ## Training procedure
144
+
145
+ ### Training hyperparameters
146
+
147
+ The following hyperparameters were used during training:
148
+ - learning_rate: 6e-05
149
+ - train_batch_size: 8
150
+ - eval_batch_size: 8
151
+ - seed: 42
152
+ - gradient_accumulation_steps: 2
153
+ - total_train_batch_size: 16
154
+ - optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
155
+ - lr_scheduler_type: cosine
156
+ - lr_scheduler_warmup_steps: 25
157
+ - num_epochs: 2
158
+
159
+ ### Training results
160
+
161
+
162
+
163
+ ### Framework versions
164
+
165
+ - PEFT 0.14.0
166
+ - Transformers 4.47.1
167
+ - Pytorch 2.5.1+cu124
168
+ - Datasets 3.1.0
169
  - Tokenizers 0.21.0