trainer

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.032227
train_batch_size: 512
eval_batch_size: 512
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 4096
optimizer: Use OptimizerNames.SCHEDULE_FREE_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
lr_scheduler_warmup_steps: 1000
training_steps: 1000000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	0	0	4.3903	0.0137
No log	0.0044	122	1.1251	0.6574
No log	0.0087	244	0.8266	0.7365
No log	0.0131	366	0.7493	0.7590
No log	0.0175	488	0.6913	0.7755
9.1782	0.0218	610	0.6348	0.7927
9.1782	0.0262	732	0.5897	0.8064
9.1782	0.0306	854	0.5569	0.8170
9.1782	0.0349	976	0.5262	0.8266
5.0917	0.0393	1098	0.4957	0.8360
5.0917	0.0437	1220	0.4761	0.8424
5.0917	0.0480	1342	0.4616	0.8464
5.0917	0.0524	1464	0.4479	0.8510
4.0398	0.0568	1586	0.4397	0.8536
4.0398	0.0611	1708	0.4293	0.8564
4.0398	0.0655	1830	0.4231	0.8592
4.0398	0.0699	1952	0.4139	0.8614
3.5268	0.0743	2074	0.4088	0.8635
3.5268	0.0786	2196	0.4035	0.8649
3.5268	0.0830	2318	0.4000	0.8666
3.5268	0.0874	2440	0.3950	0.8678
3.3084	0.0917	2562	0.3915	0.8688
3.3084	0.0961	2684	0.3866	0.8705
3.3084	0.1005	2806	0.3843	0.8712
3.3084	0.1048	2928	0.3804	0.8726
3.1769	0.1092	3050	0.3776	0.8733
3.1769	0.1136	3172	0.3729	0.8749
3.1769	0.1179	3294	0.3723	0.8751
3.1769	0.1223	3416	0.3698	0.8759
3.0785	0.1267	3538	0.3659	0.8772
3.0785	0.1310	3660	0.3644	0.8775
3.0785	0.1354	3782	0.3599	0.8788
3.0785	0.1398	3904	0.3584	0.8794
2.9831	0.1441	4026	0.3567	0.8800
2.9831	0.1485	4148	0.3528	0.8817
2.9831	0.1529	4270	0.3535	0.8811
2.9831	0.1572	4392	0.3541	0.8809

Safetensors

Model size

4.55M params

Tensor type

F32