Se124M10KInfPrompt_WT_EOS

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss
3.2645	0.5970	20	2.8750
3.0103	1.1791	40	2.6534
2.7014	1.7761	60	2.3720
2.3851	2.3582	80	2.1090
2.1288	2.9552	100	1.8650
1.9062	3.5373	120	1.6703
1.739	4.1194	140	1.5153
1.6033	4.7164	160	1.3879
1.5039	5.2985	180	1.2845
1.4219	5.8955	200	1.2122
1.357	6.4776	220	1.1649
1.3201	7.0597	240	1.1304
1.2923	7.6567	260	1.1101
1.2831	8.2388	280	1.1035
1.2715	8.8358	300	1.1007
1.2721	9.4179	320	1.0986