Se124M100KInfPrompt_WT_EOS_medium

This model is a fine-tuned version of gpt2-medium on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 3

Training Loss	Epoch	Step	Validation Loss
2.8652	0.0655	20	2.6742
2.6735	0.1309	40	2.4205
2.3498	0.1964	60	2.0554
1.9542	0.2619	80	1.6239
1.5661	0.3273	100	1.2791
1.3052	0.3928	120	1.0776
1.1291	0.4583	140	0.9537
1.0151	0.5237	160	0.8837
0.9431	0.5892	180	0.8324
0.8821	0.6547	200	0.8044
0.8536	0.7201	220	0.7846
0.8371	0.7856	240	0.7712
0.8281	0.8511	260	0.7628
0.8077	0.9165	280	0.7553
0.8013	0.9820	300	0.7501
0.7948	1.0458	320	0.7447
0.783	1.1113	340	0.7394
0.7727	1.1768	360	0.7372
0.777	1.2422	380	0.7331
0.7711	1.3077	400	0.7309
0.7642	1.3732	420	0.7289
0.7631	1.4386	440	0.7267
0.7581	1.5041	460	0.7250
0.7606	1.5696	480	0.7233
0.7578	1.6350	500	0.7223
0.7562	1.7005	520	0.7208
0.7497	1.7660	540	0.7195
0.7508	1.8314	560	0.7179
0.7476	1.8969	580	0.7168
0.7503	1.9624	600	0.7165
0.7414	2.0262	620	0.7164
0.7425	2.0917	640	0.7159
0.7451	2.1571	660	0.7146
0.7452	2.2226	680	0.7147
0.7446	2.2881	700	0.7138
0.7437	2.3535	720	0.7140
0.7397	2.4190	740	0.7131
0.7426	2.4845	760	0.7130
0.7421	2.5499	780	0.7127
0.7408	2.6154	800	0.7135
0.7413	2.6809	820	0.7135
0.7404	2.7463	840	0.7131
0.7373	2.8118	860	0.7128
0.7451	2.8773	880	0.7134
0.7407	2.9427	900	0.7127