gemini_chakma_bert

This model is a fine-tuned version of google-bert/bert-base-multilingual-cased on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 20
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
2.5731	1.0	228	2.7399
2.2245	2.0	456	2.4805
2.0515	3.0	684	2.3670
1.9288	4.0	912	2.2668
1.8429	5.0	1140	2.1341
1.7925	6.0	1368	2.1439
1.7298	7.0	1596	2.0550
1.6792	8.0	1824	2.0018
1.625	9.0	2052	1.9309
1.5888	10.0	2280	1.9261
1.5427	11.0	2508	1.8502
1.5201	12.0	2736	1.8298
1.4936	13.0	2964	1.8438
1.4709	14.0	3192	1.7637
1.4441	15.0	3420	1.7729
1.4201	16.0	3648	1.7761
1.4208	17.0	3876	1.7413
1.4038	18.0	4104	1.7010
1.3808	19.0	4332	1.7308
1.3808	20.0	4560	1.7301

Safetensors

Model size

178M params

Tensor type

F32

Base model

Finetuned

(855)

this model