gemini_chakma_banglaBert

This model is a fine-tuned version of sagorsarker/bangla-bert-base on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 20
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
4.9302	1.0	161	4.4814
4.1569	2.0	322	4.1333
3.8372	3.0	483	4.0134
3.6273	4.0	644	3.8247
3.4985	5.0	805	3.6106
3.3996	6.0	966	3.6456
3.2831	7.0	1127	3.4770
3.1861	8.0	1288	3.4076
3.1303	9.0	1449	3.2995
3.0423	10.0	1610	3.2370
2.9555	11.0	1771	3.2081
2.9047	12.0	1932	3.1658
2.85	13.0	2093	3.0383
2.8092	14.0	2254	3.0543
2.7829	15.0	2415	3.0478
2.7282	16.0	2576	2.9925
2.7599	17.0	2737	2.9768
2.681	18.0	2898	2.9205
2.641	19.0	3059	2.9632
2.6854	20.0	3220	2.9660

Safetensors

Model size

0.2B params

Tensor type

F32

Base model

Finetuned

(32)

this model