Update README.md
Browse files
README.md
CHANGED
|
@@ -5,17 +5,53 @@ base_model: Qwen/Qwen2.5-0.5B-Instruct
|
|
| 5 |
tags:
|
| 6 |
- generated_from_trainer
|
| 7 |
- axolotl
|
| 8 |
-
model-index:
|
| 9 |
-
- name: outputs/qwen05B
|
| 10 |
-
results: []
|
| 11 |
language:
|
| 12 |
- it
|
| 13 |
- en
|
| 14 |
pipeline_tag: text-generation
|
| 15 |
---
|
| 16 |
|
| 17 |
-
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
|
| 21 |
<details><summary>See axolotl config</summary>
|
|
@@ -149,46 +185,6 @@ special_tokens:
|
|
| 149 |
|
| 150 |
</details><br>
|
| 151 |
|
| 152 |
-
# outputs/qwen05B
|
| 153 |
-
|
| 154 |
-
This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) on the [ReDiX/DataForge](https://huggingface.co/datasets/ReDiX/DataForge) dataset.
|
| 155 |
-
It achieves the following results on the evaluation set:
|
| 156 |
-
- Loss: 1.4100
|
| 157 |
-
|
| 158 |
-
## Model description
|
| 159 |
-
|
| 160 |
-
More information needed
|
| 161 |
-
|
| 162 |
-
## Intended uses & limitations
|
| 163 |
-
|
| 164 |
-
More information needed
|
| 165 |
-
|
| 166 |
-
## Training and evaluation data
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|
| 170 |
-
|------------|------:|------|-----:|--------|---|-----:|---|-----:|
|
| 171 |
-
|arc_it | 2|none | 0|acc |↑ |0.2378|± |0.0125|
|
| 172 |
-
| | |none | 0|acc_norm|↑ |0.2823|± |0.0132|
|
| 173 |
-
|hellaswag_it| 1|none | 0|acc |↑ |0.3163|± |0.0049|
|
| 174 |
-
| | |none | 0|acc_norm|↑ |0.3800|± |0.0051|
|
| 175 |
-
|m_mmlu_it | 0|none | 5|acc |↑ |0.381 |± |0.0042|
|
| 176 |
-
|
| 177 |
-
## Training procedure
|
| 178 |
-
|
| 179 |
-
### Training hyperparameters
|
| 180 |
-
|
| 181 |
-
The following hyperparameters were used during training:
|
| 182 |
-
- learning_rate: 0.0001
|
| 183 |
-
- train_batch_size: 4
|
| 184 |
-
- eval_batch_size: 4
|
| 185 |
-
- seed: 42
|
| 186 |
-
- gradient_accumulation_steps: 4
|
| 187 |
-
- total_train_batch_size: 16
|
| 188 |
-
- optimizer: Use adamw_bnb_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 189 |
-
- lr_scheduler_type: cosine
|
| 190 |
-
- lr_scheduler_warmup_steps: 10
|
| 191 |
-
- num_epochs: 2
|
| 192 |
|
| 193 |
### Training results
|
| 194 |
|
|
|
|
| 5 |
tags:
|
| 6 |
- generated_from_trainer
|
| 7 |
- axolotl
|
|
|
|
|
|
|
|
|
|
| 8 |
language:
|
| 9 |
- it
|
| 10 |
- en
|
| 11 |
pipeline_tag: text-generation
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# Qwen2.5-0.5B-Instruct-ITA
|
| 15 |
+
|
| 16 |
+
This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) on the [ReDiX/DataForge](https://huggingface.co/datasets/ReDiX/DataForge) dataset.
|
| 17 |
+
It achieves the following results on the evaluation set:
|
| 18 |
+
- Loss: 1.4100
|
| 19 |
+
|
| 20 |
+
## Model description
|
| 21 |
+
|
| 22 |
+
This model is an example of finetuning a sLLM. Italian eval improved and the model learned as espected from the training data
|
| 23 |
+
|
| 24 |
+
## Intended uses & limitations
|
| 25 |
+
|
| 26 |
+
More information needed
|
| 27 |
+
|
| 28 |
+
## Training and evaluation data
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|
| 32 |
+
|------------|------:|------|-----:|--------|---|-----:|---|-----:|
|
| 33 |
+
|arc_it | 2|none | 0|acc |↑ |0.2378|± |0.0125|
|
| 34 |
+
| | |none | 0|acc_norm|↑ |0.2823|± |0.0132|
|
| 35 |
+
|hellaswag_it| 1|none | 0|acc |↑ |0.3163|± |0.0049|
|
| 36 |
+
| | |none | 0|acc_norm|↑ |0.3800|± |0.0051|
|
| 37 |
+
|m_mmlu_it | 0|none | 5|acc |↑ |0.381 |± |0.0042|
|
| 38 |
+
|
| 39 |
+
## Training procedure
|
| 40 |
+
|
| 41 |
+
### Training hyperparameters
|
| 42 |
+
|
| 43 |
+
The following hyperparameters were used during training:
|
| 44 |
+
- learning_rate: 0.0001
|
| 45 |
+
- train_batch_size: 4
|
| 46 |
+
- eval_batch_size: 4
|
| 47 |
+
- seed: 42
|
| 48 |
+
- gradient_accumulation_steps: 4
|
| 49 |
+
- total_train_batch_size: 16
|
| 50 |
+
- optimizer: Use adamw_bnb_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 51 |
+
- lr_scheduler_type: cosine
|
| 52 |
+
- lr_scheduler_warmup_steps: 10
|
| 53 |
+
- num_epochs: 2
|
| 54 |
+
|
| 55 |
|
| 56 |
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
|
| 57 |
<details><summary>See axolotl config</summary>
|
|
|
|
| 185 |
|
| 186 |
</details><br>
|
| 187 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 188 |
|
| 189 |
### Training results
|
| 190 |
|