impossible-llms-english-mirror-reversal

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 12
eval_batch_size: 8
seed: 0
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 384
total_eval_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
training_steps: 3000
mixed_precision_training: Native AMP
label_smoothing_factor: 0.1

Training Loss	Epoch	Step	Validation Loss
21.0651	1.0	93	6.9366
16.7721	2.0	186	5.6109
16.2007	3.0	279	5.3989
15.362	4.0	372	5.0672
14.777	5.0	465	4.8375
14.1948	6.0	558	4.6671
13.7877	7.0	651	4.5429
13.4306	8.0	744	4.4433
13.2006	9.0	837	4.3635
12.9023	10.0	930	4.2974
12.926	11.0	1023	4.2489
12.7253	12.0	1116	4.2058
12.592	13.0	1209	4.1736
12.3876	14.0	1302	4.1453
12.2837	15.0	1395	4.1236
12.1655	16.0	1488	4.1020
12.1549	17.0	1581	4.0871
12.0255	18.0	1674	4.0723
12.0603	19.0	1767	4.0624
11.9875	20.0	1860	4.0519
11.766	21.0	1953	4.0446
12.0245	22.0	2046	4.0389
11.9487	23.0	2139	4.0326
11.6863	24.0	2232	4.0286
11.731	25.0	2325	4.0251
11.7887	26.0	2418	4.0217
11.8313	27.0	2511	4.0198
11.5967	28.0	2604	4.0185
11.5744	29.0	2697	4.0179
11.4695	30.0	2790	4.0173
11.6968	31.0	2883	4.0170
11.6475	32.0	2976	4.0171
30.9823	32.2598	3000	4.0171

Safetensors

Model size

0.1B params

Tensor type

F32