lbourdois commited on
Commit
3c12c1f
·
verified ·
1 Parent(s): ec65aac

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +164 -150
README.md CHANGED
@@ -1,151 +1,165 @@
1
- ---
2
- library_name: peft
3
- license: other
4
- base_model: Qwen/Qwen2.5-72B-Instruct
5
- tags:
6
- - generated_from_trainer
7
- datasets:
8
- - Fizzarolli/inkmix-v2
9
- model-index:
10
- - name: ckpts
11
- results: []
12
- ---
13
-
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
18
- <details><summary>See axolotl config</summary>
19
-
20
- axolotl version: `0.6.0`
21
- ```yaml
22
- base_model: Qwen/Qwen2.5-72B-Instruct
23
-
24
- load_in_8bit: false
25
- load_in_4bit: true
26
-
27
- plugins:
28
- - axolotl.integrations.liger.LigerPlugin
29
- liger_rope: true
30
- liger_rms_norm: true
31
- liger_glu_activation: true
32
- liger_fused_linear_cross_entropy: true
33
-
34
- strict: false
35
-
36
- adapter: lora
37
- lora_r: 16
38
- lora_alpha: 32
39
- lora_dropout: 0.25
40
- lora_target_linear: true
41
- peft_layers_to_transform:
42
- loraplus_lr_ratio: 16
43
-
44
- chat_template: chatml
45
- datasets:
46
- - path: Fizzarolli/inkmix-v2
47
- type: chat_template
48
- chat_template: tokenizer_default
49
- split: train
50
- field_messages: conversations
51
- message_field_role: from
52
- message_field_content: value
53
-
54
- dataset_prepared_path: last_run_prepared
55
- #val_set_size: 0.02
56
- output_dir: ./ckpts
57
-
58
- sequence_len: 8192
59
- sample_packing: true
60
- pad_to_sequence_len: true
61
-
62
- #wandb_project: teleut-7b-rp
63
- #wandb_entity:
64
- #wandb_watch:
65
- #wandb_name:
66
- #wandb_log_model: checkpoint
67
-
68
- # mlflow configuration if you're using it
69
- mlflow_tracking_uri: https://public-tracking.mlflow-e00zzfjq11ky6jcgtv.backbone-e00bgn6e63256prmhq.msp.eu-north1.nebius.cloud
70
- mlflow_experiment_name: tq-72b-rp-inkmixv2
71
- mlflow_run_name: v1
72
- hf_mlflow_log_artifacts: true
73
-
74
- gradient_accumulation_steps: 4
75
- micro_batch_size: 4
76
- num_epochs: 2
77
- optimizer: paged_adamw_8bit
78
- lr_scheduler: cosine
79
- learning_rate: 6e-5
80
-
81
- train_on_inputs: false
82
- group_by_length: false
83
- bf16: auto
84
- fp16:
85
- tf32: false
86
-
87
- gradient_checkpointing: unsloth
88
- gradient_checkpointing_kwargs:
89
- use_reentrant: false
90
- early_stopping_patience:
91
- resume_from_checkpoint:
92
- logging_steps: 1
93
- xformers_attention:
94
- flash_attention: true
95
-
96
- #deepspeed: deepspeed_configs/zero3_bf16.json
97
-
98
- warmup_steps: 25
99
- #evals_per_epoch: 4
100
- eval_table_size:
101
- saves_per_epoch: 10
102
- debug:
103
- weight_decay: 0.05
104
-
105
- ```
106
-
107
- </details><br>
108
-
109
- # ckpts
110
-
111
- This model is a fine-tuned version of [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) on the Fizzarolli/inkmix-v2 dataset.
112
-
113
- ## Model description
114
-
115
- More information needed
116
-
117
- ## Intended uses & limitations
118
-
119
- More information needed
120
-
121
- ## Training and evaluation data
122
-
123
- More information needed
124
-
125
- ## Training procedure
126
-
127
- ### Training hyperparameters
128
-
129
- The following hyperparameters were used during training:
130
- - learning_rate: 6e-05
131
- - train_batch_size: 4
132
- - eval_batch_size: 4
133
- - seed: 42
134
- - gradient_accumulation_steps: 4
135
- - total_train_batch_size: 16
136
- - optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
137
- - lr_scheduler_type: cosine
138
- - lr_scheduler_warmup_steps: 25
139
- - num_epochs: 2
140
-
141
- ### Training results
142
-
143
-
144
-
145
- ### Framework versions
146
-
147
- - PEFT 0.14.0
148
- - Transformers 4.47.1
149
- - Pytorch 2.5.1+cu124
150
- - Datasets 3.2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
151
  - Tokenizers 0.21.0
 
1
+ ---
2
+ library_name: peft
3
+ license: other
4
+ base_model: Qwen/Qwen2.5-72B-Instruct
5
+ tags:
6
+ - generated_from_trainer
7
+ datasets:
8
+ - Fizzarolli/inkmix-v2
9
+ language:
10
+ - zho
11
+ - eng
12
+ - fra
13
+ - spa
14
+ - por
15
+ - deu
16
+ - ita
17
+ - rus
18
+ - jpn
19
+ - kor
20
+ - vie
21
+ - tha
22
+ - ara
23
+ model-index:
24
+ - name: ckpts
25
+ results: []
26
+ ---
27
+
28
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
+ should probably proofread and complete it, then remove this comment. -->
30
+
31
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
32
+ <details><summary>See axolotl config</summary>
33
+
34
+ axolotl version: `0.6.0`
35
+ ```yaml
36
+ base_model: Qwen/Qwen2.5-72B-Instruct
37
+
38
+ load_in_8bit: false
39
+ load_in_4bit: true
40
+
41
+ plugins:
42
+ - axolotl.integrations.liger.LigerPlugin
43
+ liger_rope: true
44
+ liger_rms_norm: true
45
+ liger_glu_activation: true
46
+ liger_fused_linear_cross_entropy: true
47
+
48
+ strict: false
49
+
50
+ adapter: lora
51
+ lora_r: 16
52
+ lora_alpha: 32
53
+ lora_dropout: 0.25
54
+ lora_target_linear: true
55
+ peft_layers_to_transform:
56
+ loraplus_lr_ratio: 16
57
+
58
+ chat_template: chatml
59
+ datasets:
60
+ - path: Fizzarolli/inkmix-v2
61
+ type: chat_template
62
+ chat_template: tokenizer_default
63
+ split: train
64
+ field_messages: conversations
65
+ message_field_role: from
66
+ message_field_content: value
67
+
68
+ dataset_prepared_path: last_run_prepared
69
+ #val_set_size: 0.02
70
+ output_dir: ./ckpts
71
+
72
+ sequence_len: 8192
73
+ sample_packing: true
74
+ pad_to_sequence_len: true
75
+
76
+ #wandb_project: teleut-7b-rp
77
+ #wandb_entity:
78
+ #wandb_watch:
79
+ #wandb_name:
80
+ #wandb_log_model: checkpoint
81
+
82
+ # mlflow configuration if you're using it
83
+ mlflow_tracking_uri: https://public-tracking.mlflow-e00zzfjq11ky6jcgtv.backbone-e00bgn6e63256prmhq.msp.eu-north1.nebius.cloud
84
+ mlflow_experiment_name: tq-72b-rp-inkmixv2
85
+ mlflow_run_name: v1
86
+ hf_mlflow_log_artifacts: true
87
+
88
+ gradient_accumulation_steps: 4
89
+ micro_batch_size: 4
90
+ num_epochs: 2
91
+ optimizer: paged_adamw_8bit
92
+ lr_scheduler: cosine
93
+ learning_rate: 6e-5
94
+
95
+ train_on_inputs: false
96
+ group_by_length: false
97
+ bf16: auto
98
+ fp16:
99
+ tf32: false
100
+
101
+ gradient_checkpointing: unsloth
102
+ gradient_checkpointing_kwargs:
103
+ use_reentrant: false
104
+ early_stopping_patience:
105
+ resume_from_checkpoint:
106
+ logging_steps: 1
107
+ xformers_attention:
108
+ flash_attention: true
109
+
110
+ #deepspeed: deepspeed_configs/zero3_bf16.json
111
+
112
+ warmup_steps: 25
113
+ #evals_per_epoch: 4
114
+ eval_table_size:
115
+ saves_per_epoch: 10
116
+ debug:
117
+ weight_decay: 0.05
118
+
119
+ ```
120
+
121
+ </details><br>
122
+
123
+ # ckpts
124
+
125
+ This model is a fine-tuned version of [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) on the Fizzarolli/inkmix-v2 dataset.
126
+
127
+ ## Model description
128
+
129
+ More information needed
130
+
131
+ ## Intended uses & limitations
132
+
133
+ More information needed
134
+
135
+ ## Training and evaluation data
136
+
137
+ More information needed
138
+
139
+ ## Training procedure
140
+
141
+ ### Training hyperparameters
142
+
143
+ The following hyperparameters were used during training:
144
+ - learning_rate: 6e-05
145
+ - train_batch_size: 4
146
+ - eval_batch_size: 4
147
+ - seed: 42
148
+ - gradient_accumulation_steps: 4
149
+ - total_train_batch_size: 16
150
+ - optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
151
+ - lr_scheduler_type: cosine
152
+ - lr_scheduler_warmup_steps: 25
153
+ - num_epochs: 2
154
+
155
+ ### Training results
156
+
157
+
158
+
159
+ ### Framework versions
160
+
161
+ - PEFT 0.14.0
162
+ - Transformers 4.47.1
163
+ - Pytorch 2.5.1+cu124
164
+ - Datasets 3.2.0
165
  - Tokenizers 0.21.0