cooking_sft_fail_new_mem

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the identity and the cooking_sft_fail_new_mem datasets. It achieves the following results on the evaluation set:

Loss: 0.2036

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 16
total_train_batch_size: 128
total_eval_batch_size: 8
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.3821	0.0133	50	0.4735
0.302	0.0267	100	0.3178
0.2988	0.0400	150	0.3253
0.3054	0.0533	200	0.3250
0.2967	0.0666	250	0.3232
0.3137	0.0800	300	0.3207
0.3221	0.0933	350	0.3211
0.3188	0.1066	400	0.3204
0.308	0.1200	450	0.3149
0.3123	0.1333	500	0.3106
0.3138	0.1466	550	0.3050
0.3032	0.1600	600	0.3046
0.2827	0.1733	650	0.3017
0.2953	0.1866	700	0.2970
0.2854	0.1999	750	0.2924
0.2872	0.2133	800	0.2896
0.2866	0.2266	850	0.2836
0.2925	0.2399	900	0.2794
0.2843	0.2533	950	0.2823
0.292	0.2666	1000	0.2789
0.2775	0.2799	1050	0.2763
0.2652	0.2933	1100	0.2717
0.27	0.3066	1150	0.2712
0.277	0.3199	1200	0.2749
0.2681	0.3332	1250	0.2709
0.2699	0.3466	1300	0.2718
0.2682	0.3599	1350	0.2676
0.2668	0.3732	1400	0.2662
0.2615	0.3866	1450	0.2689
0.2501	0.3999	1500	0.2583
0.2545	0.4132	1550	0.2568
0.2618	0.4265	1600	0.2523
0.2615	0.4399	1650	0.2550
0.2512	0.4532	1700	0.2488
0.245	0.4665	1750	0.2504
0.2503	0.4799	1800	0.2481
0.2402	0.4932	1850	0.2450
0.2346	0.5065	1900	0.2440
0.2413	0.5199	1950	0.2425
0.24	0.5332	2000	0.2383
0.2398	0.5465	2050	0.2408
0.2473	0.5598	2100	0.2384
0.2423	0.5732	2150	0.2348
0.2294	0.5865	2200	0.2311
0.2403	0.5998	2250	0.2323
0.2319	0.6132	2300	0.2297
0.222	0.6265	2350	0.2288
0.2193	0.6398	2400	0.2303
0.2252	0.6531	2450	0.2247
0.2304	0.6665	2500	0.2211
0.2139	0.6798	2550	0.2199
0.2186	0.6931	2600	0.2192
0.2156	0.7065	2650	0.2183
0.2187	0.7198	2700	0.2159
0.222	0.7331	2750	0.2174
0.2162	0.7465	2800	0.2153
0.2253	0.7598	2850	0.2132
0.2066	0.7731	2900	0.2134
0.2113	0.7864	2950	0.2107
0.2107	0.7998	3000	0.2085
0.2055	0.8131	3050	0.2097
0.2045	0.8264	3100	0.2075
0.2172	0.8398	3150	0.2062
0.2138	0.8531	3200	0.2075
0.194	0.8664	3250	0.2051
0.2133	0.8798	3300	0.2051
0.2025	0.8931	3350	0.2047
0.2088	0.9064	3400	0.2050
0.204	0.9197	3450	0.2044
0.2059	0.9331	3500	0.2039
0.2103	0.9464	3550	0.2039
0.2102	0.9597	3600	0.2039
0.2051	0.9731	3650	0.2038
0.2017	0.9864	3700	0.2037
0.2088	0.9997	3750	0.2037

Framework versions

Transformers 4.49.0
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

izzcw
/

cooking_sft_fail_new_mem

cooking_sft_fail_new_mem

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for izzcw/cooking_sft_fail_new_mem

Evaluation results