impossible-llms-english-random-fourgram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 12
eval_batch_size: 8
seed: 0
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 384
total_eval_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
training_steps: 3000
mixed_precision_training: Native AMP
label_smoothing_factor: 0.1

Training Loss	Epoch	Step	Validation Loss
21.2318	1.0	96	7.0318
17.3066	2.0	192	5.8019
16.9078	3.0	288	5.6130
16.3046	4.0	384	5.3811
15.7736	5.0	480	5.1867
15.2597	6.0	576	5.0483
14.9974	7.0	672	4.9447
14.6845	8.0	768	4.8589
14.5313	9.0	864	4.7952
14.425	10.0	960	4.7419
14.09	11.0	1056	4.6954
13.959	12.0	1152	4.6586
13.9513	13.0	1248	4.6308
13.7675	14.0	1344	4.6051
13.6601	15.0	1440	4.5844
13.5687	16.0	1536	4.5667
13.5257	17.0	1632	4.5534
13.4789	18.0	1728	4.5398
13.4417	19.0	1824	4.5290
13.3908	20.0	1920	4.5210
13.307	21.0	2016	4.5132
13.3016	22.0	2112	4.5081
13.2893	23.0	2208	4.5023
13.2032	24.0	2304	4.4990
13.1012	25.0	2400	4.4962
13.1562	26.0	2496	4.4939
12.9843	27.0	2592	4.4923
13.0885	28.0	2688	4.4913
13.0813	29.0	2784	4.4908
13.1086	30.0	2880	4.4904
13.1765	31.0	2976	4.4904
34.9023	31.2516	3000	4.4904