impossible-llms-english-random-trigram

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 12
eval_batch_size: 8
seed: 0
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 384
total_eval_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
training_steps: 3000
mixed_precision_training: Native AMP
label_smoothing_factor: 0.1

Training Loss	Epoch	Step	Validation Loss
14.1482	1.0	96	6.9646
11.4328	2.0	192	5.6992
11.1488	3.0	288	5.5112
10.5646	4.0	384	5.2485
10.2163	5.0	480	5.0376
9.8751	6.0	576	4.8854
9.6552	7.0	672	4.7683
9.4312	8.0	768	4.6836
9.301	9.0	864	4.6148
9.2448	10.0	960	4.5597
9.1271	11.0	1056	4.5156
9.0854	12.0	1152	4.4794
8.9255	13.0	1248	4.4493
8.8784	14.0	1344	4.4255
8.7833	15.0	1440	4.4035
8.6755	16.0	1536	4.3862
8.6895	17.0	1632	4.3722
8.6269	18.0	1728	4.3582
8.5067	19.0	1824	4.3492
8.4444	20.0	1920	4.3404
8.5608	21.0	2016	4.3332
8.4592	22.0	2112	4.3274
8.4261	23.0	2208	4.3233
8.471	24.0	2304	4.3193
8.3813	25.0	2400	4.3163
8.3404	26.0	2496	4.3149
8.3891	27.0	2592	4.3132
8.3628	28.0	2688	4.3122
8.4306	29.0	2784	4.3117
8.2589	30.0	2880	4.3113
8.247	31.0	2976	4.3113
33.3577	31.2520	3000	4.3113