tes_chakma_deberta
This model is a fine-tuned version of microsoft/deberta-v3-base on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.8596
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 20
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.9436 | 1.0 | 465 | 2.4593 |
2.0494 | 2.0 | 930 | 2.2688 |
1.8839 | 3.0 | 1395 | 2.2129 |
1.8058 | 4.0 | 1860 | 2.1472 |
1.7363 | 5.0 | 2325 | 2.0998 |
1.6849 | 6.0 | 2790 | 2.0739 |
1.64 | 7.0 | 3255 | 2.0289 |
1.5953 | 8.0 | 3720 | 1.9871 |
1.5632 | 9.0 | 4185 | 1.9855 |
1.5379 | 10.0 | 4650 | 1.9571 |
1.5179 | 11.0 | 5115 | 1.9333 |
1.4938 | 12.0 | 5580 | 1.9415 |
1.4686 | 13.0 | 6045 | 1.8951 |
1.4604 | 14.0 | 6510 | 1.8786 |
1.4409 | 15.0 | 6975 | 1.8658 |
1.4287 | 16.0 | 7440 | 1.8866 |
1.4099 | 17.0 | 7905 | 1.8704 |
1.4014 | 18.0 | 8370 | 1.8684 |
1.3986 | 19.0 | 8835 | 1.8314 |
1.3956 | 20.0 | 9300 | 1.8596 |
Framework versions
- Transformers 4.56.1
- Pytorch 2.8.0+cu126
- Datasets 4.0.0
- Tokenizers 0.22.0
- Downloads last month
- 9
Model tree for adity12345/tes_chakma_deberta
Base model
microsoft/deberta-v3-base