impossible-llms-english-natural

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 12
eval_batch_size: 8
seed: 0
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 384
total_eval_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
training_steps: 3000
mixed_precision_training: Native AMP
label_smoothing_factor: 0.1

Training Loss	Epoch	Step	Validation Loss
58.027	1.0	86	7.1861
44.3552	2.0	172	5.5613
42.3359	3.0	258	5.2823
40.688	4.0	344	5.0685
38.9604	5.0	430	4.8318
37.6082	6.0	516	4.6480
36.6465	7.0	602	4.5107
35.7031	8.0	688	4.4041
34.7611	9.0	774	4.3191
34.6912	10.0	860	4.2578
33.8642	11.0	946	4.2083
33.3426	12.0	1032	4.1692
33.1165	13.0	1118	4.1382
32.8416	14.0	1204	4.1110
32.4453	15.0	1290	4.0879
32.297	16.0	1376	4.0682
32.2745	17.0	1462	4.0541
31.8602	18.0	1548	4.0416
31.5979	19.0	1634	4.0296
31.7409	20.0	1720	4.0234
31.4115	21.0	1806	4.0143
31.3564	22.0	1892	4.0074
31.1016	23.0	1978	4.0023
30.8809	24.0	2064	3.9992
31.0388	25.0	2150	3.9948
30.9397	26.0	2236	3.9915
30.9424	27.0	2322	3.9893
30.9243	28.0	2408	3.9881
30.6877	29.0	2494	3.9873
30.5782	30.0	2580	3.9858
30.3729	31.0	2666	3.9851
30.5981	32.0	2752	3.9848
30.7725	33.0	2838	3.9844
30.5009	34.0	2924	3.9844
30.614	34.8837	3000	3.9845