impossible-llms-english-random

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 12
eval_batch_size: 8
seed: 0
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 384
total_eval_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
training_steps: 3000
mixed_precision_training: Native AMP
label_smoothing_factor: 0.1

Training Loss	Epoch	Step	Validation Loss
35.913	1.0	95	7.1506
30.1513	2.0	190	5.9945
29.2184	3.0	285	5.8395
28.5279	4.0	380	5.6620
27.8359	5.0	475	5.5429
27.5482	6.0	570	5.4532
27.0829	7.0	665	5.3803
26.7397	8.0	760	5.3227
26.4572	9.0	855	5.2749
26.2057	10.0	950	5.2360
25.9724	11.0	1045	5.2010
25.7457	12.0	1140	5.1755
25.7047	13.0	1235	5.1526
25.5117	14.0	1330	5.1328
25.3094	15.0	1425	5.1168
25.0625	16.0	1520	5.1017
24.9048	17.0	1615	5.0899
25.1186	18.0	1710	5.0804
25.0563	19.0	1805	5.0721
24.8198	20.0	1900	5.0669
24.7689	21.0	1995	5.0611
24.8698	22.0	2090	5.0565
24.5199	23.0	2185	5.0543
24.8015	24.0	2280	5.0501
24.4517	25.0	2375	5.0494
24.5355	26.0	2470	5.0486
24.5157	27.0	2565	5.0473
24.6138	28.0	2660	5.0470
24.4382	29.0	2755	5.0465
24.4547	30.0	2850	5.0463
24.4558	31.0	2945	5.0462
39.0136	31.5812	3000	5.0462