loose_default_seed-63_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 63
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
5.969	0.9994	1486	4.4120	0.2938
4.3054	1.9997	2973	3.9006	0.3330
3.6966	2.9992	4459	3.6294	0.3565
3.5272	3.9995	5946	3.4676	0.3713
3.3104	4.9997	7433	3.3730	0.3798
3.24	5.9993	8919	3.3109	0.3859
3.1305	6.9996	10406	3.2702	0.3902
3.091	7.9998	11893	3.2472	0.3923
3.0286	8.9994	13379	3.2262	0.3948
3.0038	9.9997	14866	3.2145	0.3958
2.9647	10.9992	16352	3.2051	0.3971
2.9451	11.9995	17839	3.2006	0.3979
2.9235	12.9997	19326	3.1951	0.3989
2.9054	13.9993	20812	3.1907	0.3992
2.8946	14.9996	22299	3.1915	0.3995
2.877	15.9998	23786	3.1858	0.3999
2.8765	16.9994	25272	3.1844	0.4002
2.8564	17.9997	26759	3.1840	0.4008
2.863	18.9992	28245	3.1821	0.4008
2.8446	19.9914	29720	3.1825	0.4010

Safetensors

Model size

0.1B params

Tensor type

F32