gpt-small-c4

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.4497

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 20
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
7.0256 0.4005 1000 6.4298
6.264 0.8010 2000 6.0446
5.9635 1.2014 3000 5.7924
5.7506 1.6019 4000 5.6125
5.6108 2.0024 5000 5.4753
5.4654 2.4029 6000 5.3627
5.3748 2.8034 7000 5.2686
5.2775 3.2038 8000 5.1859
5.1925 3.6043 9000 5.1097
5.1347 4.0048 10000 5.0354
5.0484 4.4053 11000 4.9773
4.9933 4.8058 12000 4.9212
4.9351 5.2062 13000 4.8743
4.8865 5.6067 14000 4.8315
4.8497 6.0072 15000 4.7960
4.793 6.4077 16000 4.7648
4.7707 6.8082 17000 4.7354
4.7275 7.2087 18000 4.7083
4.702 7.6091 19000 4.6839
4.6871 8.0096 20000 4.6641
4.6432 8.4101 21000 4.6458
4.6269 8.8106 22000 4.6268
4.6021 9.2111 23000 4.6126
4.5857 9.6115 24000 4.5965
4.5755 10.0120 25000 4.5830
4.5421 10.4125 26000 4.5738
4.5401 10.8130 27000 4.5622
4.5149 11.2135 28000 4.5539
4.5035 11.6139 29000 4.5425
4.4957 12.0144 30000 4.5350
4.47 12.4149 31000 4.5258
4.4736 12.8154 32000 4.5191
4.4503 13.2159 33000 4.5091
4.4474 13.6163 34000 4.5037
4.4405 14.0168 35000 4.4968
4.4225 14.4173 36000 4.4937
4.4167 14.8178 37000 4.4876
4.4138 15.2183 38000 4.4808
4.4023 15.6187 39000 4.4764
4.3988 16.0192 40000 4.4723
4.3839 16.4197 41000 4.4702
4.3865 16.8202 42000 4.4652
4.3782 17.2207 43000 4.4615
4.3732 17.6211 44000 4.4601
4.3711 18.0216 45000 4.4561
4.3628 18.4221 46000 4.4542
4.362 18.8226 47000 4.4522
4.3554 19.2231 48000 4.4509
4.3549 19.6235 49000 4.4497

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 2.20.0
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
44.9M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support