Llama-3.1-8B

This model is a fine-tuned version of meta-llama/Llama-3.1-8B on the decomposition_train_data dataset. It achieves the following results on the evaluation set:

Loss: 1.1674

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 16
total_eval_batch_size: 8
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss
1.2376	0.0846	100	1.2389
1.1944	0.1693	200	1.1727
1.1895	0.2539	300	1.1570
1.1845	0.3386	400	1.1547
1.1472	0.4232	500	1.1444
1.1284	0.5078	600	1.1434
1.1146	0.5925	700	1.1365
1.1565	0.6771	800	1.1331
1.1334	0.7617	900	1.1285
1.1488	0.8464	1000	1.1251
1.13	0.9310	1100	1.1225
0.9185	1.0152	1200	1.1412
0.9155	1.0999	1300	1.1359
0.9075	1.1845	1400	1.1338
0.9425	1.2691	1500	1.1361
0.9337	1.3538	1600	1.1349
0.9065	1.4384	1700	1.1268
0.8973	1.5231	1800	1.1272
0.9226	1.6077	1900	1.1252
0.9277	1.6923	2000	1.1239
0.9043	1.7770	2100	1.1208
0.9415	1.8616	2200	1.1203
0.8886	1.9463	2300	1.1192
0.7334	2.0305	2400	1.1619
0.7161	2.1151	2500	1.1711
0.7053	2.1997	2600	1.1680
0.7234	2.2844	2700	1.1642
0.7635	2.3690	2800	1.1665
0.7142	2.4537	2900	1.1665
0.7107	2.5383	3000	1.1685
0.7201	2.6229	3100	1.1664
0.7314	2.7076	3200	1.1695
0.7443	2.7922	3300	1.1667
0.7216	2.8769	3400	1.1673
0.7375	2.9615	3500	1.1674

Framework versions

Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 2.21.0
Tokenizers 0.21.1

Downloads last month: 6

Safetensors

Model size

8.03B params

Tensor type

BF16

Model tree for w3en2g/decomposition-Llama-3.1-8B

Base model

meta-llama/Llama-3.1-8B

Finetuned

(1577)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard