gpt-small-c4
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 4.4497
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 20
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
7.0256 | 0.4005 | 1000 | 6.4298 |
6.264 | 0.8010 | 2000 | 6.0446 |
5.9635 | 1.2014 | 3000 | 5.7924 |
5.7506 | 1.6019 | 4000 | 5.6125 |
5.6108 | 2.0024 | 5000 | 5.4753 |
5.4654 | 2.4029 | 6000 | 5.3627 |
5.3748 | 2.8034 | 7000 | 5.2686 |
5.2775 | 3.2038 | 8000 | 5.1859 |
5.1925 | 3.6043 | 9000 | 5.1097 |
5.1347 | 4.0048 | 10000 | 5.0354 |
5.0484 | 4.4053 | 11000 | 4.9773 |
4.9933 | 4.8058 | 12000 | 4.9212 |
4.9351 | 5.2062 | 13000 | 4.8743 |
4.8865 | 5.6067 | 14000 | 4.8315 |
4.8497 | 6.0072 | 15000 | 4.7960 |
4.793 | 6.4077 | 16000 | 4.7648 |
4.7707 | 6.8082 | 17000 | 4.7354 |
4.7275 | 7.2087 | 18000 | 4.7083 |
4.702 | 7.6091 | 19000 | 4.6839 |
4.6871 | 8.0096 | 20000 | 4.6641 |
4.6432 | 8.4101 | 21000 | 4.6458 |
4.6269 | 8.8106 | 22000 | 4.6268 |
4.6021 | 9.2111 | 23000 | 4.6126 |
4.5857 | 9.6115 | 24000 | 4.5965 |
4.5755 | 10.0120 | 25000 | 4.5830 |
4.5421 | 10.4125 | 26000 | 4.5738 |
4.5401 | 10.8130 | 27000 | 4.5622 |
4.5149 | 11.2135 | 28000 | 4.5539 |
4.5035 | 11.6139 | 29000 | 4.5425 |
4.4957 | 12.0144 | 30000 | 4.5350 |
4.47 | 12.4149 | 31000 | 4.5258 |
4.4736 | 12.8154 | 32000 | 4.5191 |
4.4503 | 13.2159 | 33000 | 4.5091 |
4.4474 | 13.6163 | 34000 | 4.5037 |
4.4405 | 14.0168 | 35000 | 4.4968 |
4.4225 | 14.4173 | 36000 | 4.4937 |
4.4167 | 14.8178 | 37000 | 4.4876 |
4.4138 | 15.2183 | 38000 | 4.4808 |
4.4023 | 15.6187 | 39000 | 4.4764 |
4.3988 | 16.0192 | 40000 | 4.4723 |
4.3839 | 16.4197 | 41000 | 4.4702 |
4.3865 | 16.8202 | 42000 | 4.4652 |
4.3782 | 17.2207 | 43000 | 4.4615 |
4.3732 | 17.6211 | 44000 | 4.4601 |
4.3711 | 18.0216 | 45000 | 4.4561 |
4.3628 | 18.4221 | 46000 | 4.4542 |
4.362 | 18.8226 | 47000 | 4.4522 |
4.3554 | 19.2231 | 48000 | 4.4509 |
4.3549 | 19.6235 | 49000 | 4.4497 |
Framework versions
- Transformers 4.49.0
- Pytorch 2.6.0+cu124
- Datasets 2.20.0
- Tokenizers 0.21.0
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support