Llama-3.1-8B

This model is a fine-tuned version of meta-llama/Llama-3.1-8B on the decomposition_train_data dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1674

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
1.2376 0.0846 100 1.2389
1.1944 0.1693 200 1.1727
1.1895 0.2539 300 1.1570
1.1845 0.3386 400 1.1547
1.1472 0.4232 500 1.1444
1.1284 0.5078 600 1.1434
1.1146 0.5925 700 1.1365
1.1565 0.6771 800 1.1331
1.1334 0.7617 900 1.1285
1.1488 0.8464 1000 1.1251
1.13 0.9310 1100 1.1225
0.9185 1.0152 1200 1.1412
0.9155 1.0999 1300 1.1359
0.9075 1.1845 1400 1.1338
0.9425 1.2691 1500 1.1361
0.9337 1.3538 1600 1.1349
0.9065 1.4384 1700 1.1268
0.8973 1.5231 1800 1.1272
0.9226 1.6077 1900 1.1252
0.9277 1.6923 2000 1.1239
0.9043 1.7770 2100 1.1208
0.9415 1.8616 2200 1.1203
0.8886 1.9463 2300 1.1192
0.7334 2.0305 2400 1.1619
0.7161 2.1151 2500 1.1711
0.7053 2.1997 2600 1.1680
0.7234 2.2844 2700 1.1642
0.7635 2.3690 2800 1.1665
0.7142 2.4537 2900 1.1665
0.7107 2.5383 3000 1.1685
0.7201 2.6229 3100 1.1664
0.7314 2.7076 3200 1.1695
0.7443 2.7922 3300 1.1667
0.7216 2.8769 3400 1.1673
0.7375 2.9615 3500 1.1674

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 2.21.0
  • Tokenizers 0.21.1
Downloads last month
6
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for w3en2g/decomposition-Llama-3.1-8B

Finetuned
(1577)
this model