11APRIL2025-Meta-Llama-3-8B-MEDAL-flash-attention-2-cosine-evaldata

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B on the generator dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.15
num_epochs: 0.3

Training Loss	Epoch	Step	Validation Loss
2.7516	0.0069	100	2.6728
2.5893	0.0138	200	2.5248
2.494	0.0207	300	2.4700
2.4513	0.0277	400	2.4380
2.4368	0.0346	500	2.4128
2.399	0.0415	600	2.3939
2.3878	0.0484	700	2.3791
2.3604	0.0553	800	2.3660
2.3675	0.0622	900	2.3564
2.3596	0.0692	1000	2.3493
2.3551	0.0761	1100	2.3423
2.3361	0.0830	1200	2.3366
2.3218	0.0899	1300	2.3320
2.339	0.0968	1400	2.3269
2.324	0.1037	1500	2.3235
2.3243	0.1106	1600	2.3194
2.3158	0.1176	1700	2.3161
2.3034	0.1245	1800	2.3129
2.3075	0.1314	1900	2.3104
2.3189	0.1383	2000	2.3078
2.3021	0.1452	2100	2.3053
2.2934	0.1521	2200	2.3030
2.2965	0.1590	2300	2.3012
2.3036	0.1660	2400	2.2994
2.2876	0.1729	2500	2.2980
2.2904	0.1798	2600	2.2965
2.3025	0.1867	2700	2.2952
2.306	0.1936	2800	2.2940
2.2965	0.2005	2900	2.2931
2.2919	0.2075	3000	2.2921
2.2906	0.2144	3100	2.2915
2.3053	0.2213	3200	2.2909
2.2865	0.2282	3300	2.2905
2.2924	0.2351	3400	2.2902
2.2873	0.2420	3500	2.2900
2.2762	0.2489	3600	2.2898
2.2841	0.2559	3700	2.2897
2.2917	0.2628	3800	2.2897
2.2946	0.2697	3900	2.2897
2.2844	0.2766	4000	2.2897
2.2907	0.2835	4100	2.2896
2.2875	0.2904	4200	2.2896
2.2888	0.2974	4300	2.2896