Llama-3.1-8B
This model is a fine-tuned version of meta-llama/Llama-3.1-8B on the decomposition_train_data dataset. It achieves the following results on the evaluation set:
- Loss: 1.1674
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 8
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.2376 | 0.0846 | 100 | 1.2389 |
1.1944 | 0.1693 | 200 | 1.1727 |
1.1895 | 0.2539 | 300 | 1.1570 |
1.1845 | 0.3386 | 400 | 1.1547 |
1.1472 | 0.4232 | 500 | 1.1444 |
1.1284 | 0.5078 | 600 | 1.1434 |
1.1146 | 0.5925 | 700 | 1.1365 |
1.1565 | 0.6771 | 800 | 1.1331 |
1.1334 | 0.7617 | 900 | 1.1285 |
1.1488 | 0.8464 | 1000 | 1.1251 |
1.13 | 0.9310 | 1100 | 1.1225 |
0.9185 | 1.0152 | 1200 | 1.1412 |
0.9155 | 1.0999 | 1300 | 1.1359 |
0.9075 | 1.1845 | 1400 | 1.1338 |
0.9425 | 1.2691 | 1500 | 1.1361 |
0.9337 | 1.3538 | 1600 | 1.1349 |
0.9065 | 1.4384 | 1700 | 1.1268 |
0.8973 | 1.5231 | 1800 | 1.1272 |
0.9226 | 1.6077 | 1900 | 1.1252 |
0.9277 | 1.6923 | 2000 | 1.1239 |
0.9043 | 1.7770 | 2100 | 1.1208 |
0.9415 | 1.8616 | 2200 | 1.1203 |
0.8886 | 1.9463 | 2300 | 1.1192 |
0.7334 | 2.0305 | 2400 | 1.1619 |
0.7161 | 2.1151 | 2500 | 1.1711 |
0.7053 | 2.1997 | 2600 | 1.1680 |
0.7234 | 2.2844 | 2700 | 1.1642 |
0.7635 | 2.3690 | 2800 | 1.1665 |
0.7142 | 2.4537 | 2900 | 1.1665 |
0.7107 | 2.5383 | 3000 | 1.1685 |
0.7201 | 2.6229 | 3100 | 1.1664 |
0.7314 | 2.7076 | 3200 | 1.1695 |
0.7443 | 2.7922 | 3300 | 1.1667 |
0.7216 | 2.8769 | 3400 | 1.1673 |
0.7375 | 2.9615 | 3500 | 1.1674 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 2.21.0
- Tokenizers 0.21.1
- Downloads last month
- 6
Model tree for w3en2g/decomposition-Llama-3.1-8B
Base model
meta-llama/Llama-3.1-8B