gpt-small-c4

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 20
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
7.0335	0.4013	1000	6.4495
6.2621	0.8026	2000	6.0652
5.9616	1.2039	3000	5.8198
5.7575	1.6051	4000	5.6420
5.6052	2.0064	5000	5.5004
5.4667	2.4077	6000	5.3899
5.3728	2.8090	7000	5.2942
5.2787	3.2103	8000	5.2110
5.1948	3.6116	9000	5.1345
5.1323	4.0128	10000	5.0630
5.0467	4.4141	11000	5.0053
4.9973	4.8154	12000	4.9481
4.9359	5.2167	13000	4.8986
4.8862	5.6180	14000	4.8609
4.8521	6.0193	15000	4.8182
4.7941	6.4205	16000	4.7930
4.7704	6.8218	17000	4.7584
4.7287	7.2231	18000	4.7326
4.7067	7.6244	19000	4.7087
4.6804	8.0257	20000	4.6887
4.6404	8.4270	21000	4.6696
4.6315	8.8283	22000	4.6517
4.6006	9.2295	23000	4.6386
4.5852	9.6308	24000	4.6197
4.5745	10.0321	25000	4.6064
4.5438	10.4334	26000	4.5943
4.5337	10.8347	27000	4.5829
4.5162	11.2360	28000	4.5726
4.5022	11.6372	29000	4.5623
4.4938	12.0385	30000	4.5550
4.469	12.4398	31000	4.5440
4.473	12.8411	32000	4.5363
4.4532	13.2424	33000	4.5310
4.4428	13.6437	34000	4.5246
4.4395	14.0449	35000	4.5142
4.4217	14.4462	36000	4.5120
4.4187	14.8475	37000	4.5072
4.4059	15.2488	38000	4.5036
4.4034	15.6501	39000	4.4969
4.3958	16.0514	40000	4.4948
4.3858	16.4526	41000	4.4915
4.3837	16.8539	42000	4.4846
4.3776	17.2552	43000	4.4839
4.4096	17.6565	44000	4.2901
4.4058	18.0578	45000	4.2897
4.3965	18.4591	46000	4.2893
4.4046	18.8604	47000	4.2907
4.3941	19.2616	48000	4.2890
4.3896	19.6629	49000	4.2881