loose_default_seed-21_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 21
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
5.9844	0.9994	1486	4.4074	0.2933
4.3022	1.9997	2973	3.9034	0.3327
3.697	2.9992	4459	3.6258	0.3561
3.53	3.9995	5946	3.4658	0.3714
3.3054	4.9997	7433	3.3659	0.3803
3.2341	5.9993	8919	3.3071	0.3865
3.1272	6.9996	10406	3.2656	0.3905
3.0899	7.9998	11893	3.2438	0.3925
3.0287	8.9994	13379	3.2261	0.3945
3.0027	9.9997	14866	3.2144	0.3960
2.9661	10.9992	16352	3.2085	0.3971
2.9466	11.9995	17839	3.2058	0.3974
2.9257	12.9997	19326	3.1978	0.3985
2.9051	13.9993	20812	3.1924	0.3993
2.8976	14.9996	22299	3.1898	0.3994
2.8773	15.9998	23786	3.1867	0.3999
2.8773	16.9994	25272	3.1824	0.4003
2.8598	17.9997	26759	3.1903	0.4000
2.8661	18.9992	28245	3.1883	0.4006
2.8457	19.9914	29720	3.1836	0.4005

Safetensors

Model size

110M params

Tensor type

F32