gpt-small-c4

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 20
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
7.0256	0.4005	1000	6.4298
6.264	0.8010	2000	6.0446
5.9635	1.2014	3000	5.7924
5.7506	1.6019	4000	5.6125
5.6108	2.0024	5000	5.4753
5.4654	2.4029	6000	5.3627
5.3748	2.8034	7000	5.2686
5.2775	3.2038	8000	5.1859
5.1925	3.6043	9000	5.1097
5.1347	4.0048	10000	5.0354
5.0484	4.4053	11000	4.9773
4.9933	4.8058	12000	4.9212
4.9351	5.2062	13000	4.8743
4.8865	5.6067	14000	4.8315
4.8497	6.0072	15000	4.7960
4.793	6.4077	16000	4.7648
4.7707	6.8082	17000	4.7354
4.7275	7.2087	18000	4.7083
4.702	7.6091	19000	4.6839
4.6871	8.0096	20000	4.6641
4.6432	8.4101	21000	4.6458
4.6269	8.8106	22000	4.6268
4.6021	9.2111	23000	4.6126
4.5857	9.6115	24000	4.5965
4.5755	10.0120	25000	4.5830
4.5421	10.4125	26000	4.5738
4.5401	10.8130	27000	4.5622
4.5149	11.2135	28000	4.5539
4.5035	11.6139	29000	4.5425
4.4957	12.0144	30000	4.5350
4.47	12.4149	31000	4.5258
4.4736	12.8154	32000	4.5191
4.4503	13.2159	33000	4.5091
4.4474	13.6163	34000	4.5037
4.4405	14.0168	35000	4.4968
4.4225	14.4173	36000	4.4937
4.4167	14.8178	37000	4.4876
4.4138	15.2183	38000	4.4808
4.4023	15.6187	39000	4.4764
4.3988	16.0192	40000	4.4723
4.3839	16.4197	41000	4.4702
4.3865	16.8202	42000	4.4652
4.3782	17.2207	43000	4.4615
4.3732	17.6211	44000	4.4601
4.3711	18.0216	45000	4.4561
4.3628	18.4221	46000	4.4542
4.362	18.8226	47000	4.4522
4.3554	19.2231	48000	4.4509
4.3549	19.6235	49000	4.4497