gpt-small-c4

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.2881

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 20
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
7.0335 0.4013 1000 6.4495
6.2621 0.8026 2000 6.0652
5.9616 1.2039 3000 5.8198
5.7575 1.6051 4000 5.6420
5.6052 2.0064 5000 5.5004
5.4667 2.4077 6000 5.3899
5.3728 2.8090 7000 5.2942
5.2787 3.2103 8000 5.2110
5.1948 3.6116 9000 5.1345
5.1323 4.0128 10000 5.0630
5.0467 4.4141 11000 5.0053
4.9973 4.8154 12000 4.9481
4.9359 5.2167 13000 4.8986
4.8862 5.6180 14000 4.8609
4.8521 6.0193 15000 4.8182
4.7941 6.4205 16000 4.7930
4.7704 6.8218 17000 4.7584
4.7287 7.2231 18000 4.7326
4.7067 7.6244 19000 4.7087
4.6804 8.0257 20000 4.6887
4.6404 8.4270 21000 4.6696
4.6315 8.8283 22000 4.6517
4.6006 9.2295 23000 4.6386
4.5852 9.6308 24000 4.6197
4.5745 10.0321 25000 4.6064
4.5438 10.4334 26000 4.5943
4.5337 10.8347 27000 4.5829
4.5162 11.2360 28000 4.5726
4.5022 11.6372 29000 4.5623
4.4938 12.0385 30000 4.5550
4.469 12.4398 31000 4.5440
4.473 12.8411 32000 4.5363
4.4532 13.2424 33000 4.5310
4.4428 13.6437 34000 4.5246
4.4395 14.0449 35000 4.5142
4.4217 14.4462 36000 4.5120
4.4187 14.8475 37000 4.5072
4.4059 15.2488 38000 4.5036
4.4034 15.6501 39000 4.4969
4.3958 16.0514 40000 4.4948
4.3858 16.4526 41000 4.4915
4.3837 16.8539 42000 4.4846
4.3776 17.2552 43000 4.4839
4.4096 17.6565 44000 4.2901
4.4058 18.0578 45000 4.2897
4.3965 18.4591 46000 4.2893
4.4046 18.8604 47000 4.2907
4.3941 19.2616 48000 4.2890
4.3896 19.6629 49000 4.2881

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
4
Safetensors
Model size
44.9M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support