Se124M100KInfPrompt

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3662

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
2.8209 0.0164 20 2.4755
2.7724 0.0327 40 2.4510
2.7076 0.0491 60 2.3560
2.5366 0.0655 80 2.1926
2.2739 0.0818 100 1.9387
1.9339 0.0982 120 1.6141
1.5753 0.1146 140 1.2455
1.254 0.1309 160 0.9377
1.0138 0.1473 180 0.7895
0.866 0.1637 200 0.6805
0.7571 0.1800 220 0.5998
0.6901 0.1964 240 0.5554
0.629 0.2128 260 0.5310
0.5879 0.2291 280 0.5076
0.5703 0.2455 300 0.4930
0.5546 0.2619 320 0.4845
0.5419 0.2782 340 0.4762
0.5179 0.2946 360 0.4656
0.5211 0.3110 380 0.4595
0.5039 0.3273 400 0.4543
0.4992 0.3437 420 0.4524
0.4937 0.3601 440 0.4477
0.4801 0.3764 460 0.4409
0.4805 0.3928 480 0.4380
0.4805 0.4092 500 0.4354
0.468 0.4255 520 0.4343
0.4759 0.4419 540 0.4319
0.4614 0.4583 560 0.4284
0.4622 0.4746 580 0.4272
0.4608 0.4910 600 0.4267
0.4621 0.5074 620 0.4236
0.4569 0.5237 640 0.4238
0.4519 0.5401 660 0.4219
0.4478 0.5565 680 0.4189
0.4524 0.5728 700 0.4167
0.4489 0.5892 720 0.4147
0.4452 0.6056 740 0.4150
0.4424 0.6219 760 0.4118
0.4355 0.6383 780 0.4117
0.4396 0.6547 800 0.4112
0.4432 0.6710 820 0.4078
0.44 0.6874 840 0.4051
0.4341 0.7038 860 0.4050
0.4425 0.7201 880 0.4018
0.4387 0.7365 900 0.4016
0.4369 0.7529 920 0.4031
0.437 0.7692 940 0.3967
0.4314 0.7856 960 0.4007
0.4371 0.8020 980 0.3943
0.4364 0.8183 1000 0.3986
0.4292 0.8347 1020 0.3970
0.427 0.8511 1040 0.3951
0.431 0.8674 1060 0.3941
0.4327 0.8838 1080 0.3958
0.4263 0.9002 1100 0.3930
0.429 0.9165 1120 0.3901
0.4277 0.9329 1140 0.3907
0.4251 0.9493 1160 0.3906
0.4279 0.9656 1180 0.3891
0.4249 0.9820 1200 0.3884
0.4213 0.9984 1220 0.3891
0.4192 1.0147 1240 0.3870
0.4263 1.0311 1260 0.3852
0.4219 1.0475 1280 0.3897
0.4256 1.0638 1300 0.3846
0.4129 1.0802 1320 0.3855
0.4184 1.0966 1340 0.3841
0.4207 1.1129 1360 0.3835
0.418 1.1293 1380 0.3808
0.4153 1.1457 1400 0.3827
0.4247 1.1620 1420 0.3812
0.421 1.1784 1440 0.3807
0.4127 1.1948 1460 0.3802
0.4233 1.2111 1480 0.3794
0.4235 1.2275 1500 0.3782
0.4184 1.2439 1520 0.3785
0.4171 1.2602 1540 0.3796
0.4181 1.2766 1560 0.3811
0.4126 1.2930 1580 0.3780
0.4188 1.3093 1600 0.3760
0.4162 1.3257 1620 0.3769
0.4192 1.3421 1640 0.3770
0.4153 1.3584 1660 0.3763
0.4187 1.3748 1680 0.3737
0.4138 1.3912 1700 0.3755
0.4115 1.4075 1720 0.3755
0.4118 1.4239 1740 0.3756
0.4036 1.4403 1760 0.3742
0.4161 1.4566 1780 0.3731
0.4102 1.4730 1800 0.3740
0.4118 1.4894 1820 0.3731
0.4102 1.5057 1840 0.3732
0.4143 1.5221 1860 0.3744
0.4118 1.5385 1880 0.3729
0.4179 1.5548 1900 0.3721
0.4092 1.5712 1920 0.3716
0.4109 1.5876 1940 0.3726
0.4137 1.6039 1960 0.3713
0.4067 1.6203 1980 0.3714
0.4131 1.6367 2000 0.3725
0.4103 1.6530 2020 0.3702
0.4044 1.6694 2040 0.3711
0.4105 1.6858 2060 0.3727
0.4063 1.7021 2080 0.3712
0.4109 1.7185 2100 0.3709
0.4114 1.7349 2120 0.3706
0.4148 1.7512 2140 0.3711
0.4081 1.7676 2160 0.3693
0.4062 1.7840 2180 0.3694
0.4152 1.8003 2200 0.3699
0.4043 1.8167 2220 0.3686
0.4046 1.8331 2240 0.3705
0.4136 1.8494 2260 0.3684
0.4073 1.8658 2280 0.3701
0.4089 1.8822 2300 0.3689
0.4075 1.8985 2320 0.3679
0.409 1.9149 2340 0.3694
0.4096 1.9313 2360 0.3677
0.4114 1.9476 2380 0.3686
0.4083 1.9640 2400 0.3676
0.4066 1.9804 2420 0.3696
0.4053 1.9967 2440 0.3677
0.4087 2.0131 2460 0.3688
0.4055 2.0295 2480 0.3680
0.4103 2.0458 2500 0.3678
0.4031 2.0622 2520 0.3685
0.4111 2.0786 2540 0.3674
0.413 2.0949 2560 0.3675
0.4135 2.1113 2580 0.3674
0.4085 2.1277 2600 0.3664
0.4029 2.1440 2620 0.3683
0.4023 2.1604 2640 0.3677
0.4087 2.1768 2660 0.3673
0.4088 2.1931 2680 0.3678
0.4064 2.2095 2700 0.3664
0.4067 2.2259 2720 0.3669
0.4047 2.2422 2740 0.3662
0.4069 2.2586 2760 0.3666
0.4028 2.2750 2780 0.3663
0.4101 2.2913 2800 0.3664
0.4061 2.3077 2820 0.3663
0.4056 2.3241 2840 0.3657
0.4073 2.3404 2860 0.3660
0.4096 2.3568 2880 0.3665
0.4034 2.3732 2900 0.3667
0.4067 2.3895 2920 0.3668
0.4032 2.4059 2940 0.3673
0.4082 2.4223 2960 0.3666
0.4048 2.4386 2980 0.3660
0.4058 2.4550 3000 0.3661
0.4066 2.4714 3020 0.3663
0.4128 2.4877 3040 0.3662
0.4104 2.5041 3060 0.3658
0.4057 2.5205 3080 0.3658
0.408 2.5368 3100 0.3660
0.4053 2.5532 3120 0.3660
0.3998 2.5696 3140 0.3664
0.4007 2.5859 3160 0.3659
0.402 2.6023 3180 0.3660
0.4017 2.6187 3200 0.3660
0.4069 2.6350 3220 0.3659
0.4028 2.6514 3240 0.3662
0.4014 2.6678 3260 0.3663
0.4023 2.6841 3280 0.3666
0.4025 2.7005 3300 0.3668
0.4027 2.7169 3320 0.3661
0.404 2.7332 3340 0.3659
0.4064 2.7496 3360 0.3663
0.4059 2.7660 3380 0.3659
0.4007 2.7823 3400 0.3659
0.4044 2.7987 3420 0.3663
0.4075 2.8151 3440 0.3658
0.4053 2.8314 3460 0.3660
0.4003 2.8478 3480 0.3664
0.4078 2.8642 3500 0.3662
0.4067 2.8805 3520 0.3661
0.4025 2.8969 3540 0.3660
0.4018 2.9133 3560 0.3658
0.4014 2.9296 3580 0.3661
0.4036 2.9460 3600 0.3661
0.4031 2.9624 3620 0.3657
0.4022 2.9787 3640 0.3660
0.4036 2.9951 3660 0.3662

Framework versions

  • PEFT 0.15.1
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu118
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for augustocsc/Se124M100KInfPrompt

Adapter
(1671)
this model