KoModernBERT-base-mlm-v03-retry-ckp01

This model is a fine-tuned version of x2bee/KoModernBERT-base-mlm-v03-ckp00 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9059

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 512
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss
20.5275 0.0897 5000 2.5659
18.9352 0.1795 10000 2.3904
18.2709 0.2692 15000 2.2732
17.7317 0.3589 20000 2.1961
17.0678 0.4487 25000 2.1421
16.5033 0.5384 30000 2.0931
16.7729 0.6282 35000 2.0439
16.1884 0.7179 40000 2.0005
15.6351 0.8076 45000 1.9635
15.3606 0.8974 50000 1.9319
15.2845 0.9871 55000 1.9059

Framework versions

  • Transformers 4.48.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
0
Safetensors
Model size
184M params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for x2bee/KoModernBERT-base-mlm-v03-retry-ckp02