train_wsc_1745950303

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5956
  • Num Input Tokens Seen: 14002704

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 40000

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.9367 1.6024 200 0.6859 70144
0.7729 3.2008 400 0.6358 140304
0.6178 4.8032 600 0.6251 210240
0.7354 6.4016 800 0.6166 279952
0.8205 8.0 1000 0.6166 350224
0.9947 9.6024 1200 0.6063 420256
0.8109 11.2008 1400 0.6140 490496
0.6329 12.8032 1600 0.6054 560224
0.6829 14.4016 1800 0.6053 630560
0.6086 16.0 2000 0.6093 699648
0.485 17.6024 2200 0.6015 769232
0.9604 19.2008 2400 0.6084 839344
0.6535 20.8032 2600 0.6110 909744
0.6409 22.4016 2800 0.6115 979312
0.7109 24.0 3000 0.6066 1049184
0.7251 25.6024 3200 0.6034 1119552
0.6356 27.2008 3400 0.6066 1189008
0.8557 28.8032 3600 0.6137 1259168
0.759 30.4016 3800 0.6130 1329056
0.9193 32.0 4000 0.6128 1399280
0.7954 33.6024 4200 0.6092 1469920
0.7279 35.2008 4400 0.6051 1539184
0.941 36.8032 4600 0.6034 1609648
0.9295 38.4016 4800 0.6008 1679792
0.7476 40.0 5000 0.6098 1749008
0.8862 41.6024 5200 0.6106 1818832
0.7252 43.2008 5400 0.6087 1889136
0.501 44.8032 5600 0.6182 1959008
0.4602 46.4016 5800 0.6046 2028320
0.7075 48.0 6000 0.6129 2098928
0.7795 49.6024 6200 0.6080 2168688
0.6954 51.2008 6400 0.6075 2238752
0.905 52.8032 6600 0.6000 2308816
0.8237 54.4016 6800 0.6067 2379328
0.6337 56.0 7000 0.6052 2448704
0.8776 57.6024 7200 0.6037 2519008
0.7921 59.2008 7400 0.6066 2588608
0.8712 60.8032 7600 0.6045 2659072
0.6104 62.4016 7800 0.6041 2728480
0.9738 64.0 8000 0.6079 2798720
0.6123 65.6024 8200 0.6013 2868672
0.5486 67.2008 8400 0.6026 2939312
0.4234 68.8032 8600 0.6083 3009568
0.706 70.4016 8800 0.6032 3079584
0.5217 72.0 9000 0.6046 3149680
0.4153 73.6024 9200 0.6172 3219680
0.4354 75.2008 9400 0.6041 3289472
0.6993 76.8032 9600 0.5956 3359520
0.7275 78.4016 9800 0.6037 3429568
0.5396 80.0 10000 0.6079 3499648
0.7598 81.6024 10200 0.6038 3569504
0.7379 83.2008 10400 0.6109 3639920
0.9387 84.8032 10600 0.6056 3709520
0.7098 86.4016 10800 0.5983 3779456
0.6795 88.0 11000 0.6039 3849744
0.7353 89.6024 11200 0.6032 3919984
0.6685 91.2008 11400 0.6080 3989872
0.7216 92.8032 11600 0.6073 4059568
0.8336 94.4016 11800 0.6013 4129664
0.548 96.0 12000 0.6024 4199936
0.9363 97.6024 12200 0.5981 4269952
0.6282 99.2008 12400 0.6110 4339040
0.7682 100.8032 12600 0.6031 4409680
0.9204 102.4016 12800 0.6103 4479120
0.6169 104.0 13000 0.6119 4548896
0.7145 105.6024 13200 0.6044 4619216
0.7454 107.2008 13400 0.6099 4689424
0.7114 108.8032 13600 0.6078 4759232
0.7552 110.4016 13800 0.6081 4829120
0.5361 112.0 14000 0.6138 4899024
0.6323 113.6024 14200 0.5998 4968944
0.7257 115.2008 14400 0.6055 5039152
0.5306 116.8032 14600 0.6010 5109312
0.8061 118.4016 14800 0.6115 5179296
0.7583 120.0 15000 0.6079 5249504
0.818 121.6024 15200 0.6016 5319424
0.909 123.2008 15400 0.6039 5389488
0.9621 124.8032 15600 0.6032 5459776
0.3719 126.4016 15800 0.6107 5529760
0.8277 128.0 16000 0.6074 5599968
0.5884 129.6024 16200 0.6056 5671056
0.6286 131.2008 16400 0.6104 5740000
0.6262 132.8032 16600 0.6098 5810288
0.6929 134.4016 16800 0.6065 5880176
0.6835 136.0 17000 0.6080 5950048
0.7025 137.6024 17200 0.6135 6020016
0.8546 139.2008 17400 0.6162 6090672
0.5158 140.8032 17600 0.6072 6160288
0.7597 142.4016 17800 0.6078 6230656
0.8127 144.0 18000 0.6005 6299968
0.669 145.6024 18200 0.6080 6370512
0.7968 147.2008 18400 0.6064 6440784
0.5663 148.8032 18600 0.6056 6510560
0.6785 150.4016 18800 0.6010 6579872
0.8551 152.0 19000 0.6024 6650112
0.7856 153.6024 19200 0.5996 6720368
0.5416 155.2008 19400 0.6072 6790512
0.7651 156.8032 19600 0.6056 6860880
0.6543 158.4016 19800 0.6175 6930576
0.5508 160.0 20000 0.6053 7000640
0.6528 161.6024 20200 0.6023 7070272
0.6598 163.2008 20400 0.5996 7140336
0.5761 164.8032 20600 0.6078 7210816
0.653 166.4016 20800 0.6016 7281392
0.8061 168.0 21000 0.6057 7350960
0.7621 169.6024 21200 0.6053 7421312
0.6579 171.2008 21400 0.6047 7491200
0.5762 172.8032 21600 0.6003 7560976
0.9284 174.4016 21800 0.6020 7631024
0.6199 176.0 22000 0.6054 7700784
0.7859 177.6024 22200 0.6110 7770752
0.3245 179.2008 22400 0.6039 7840832
0.7359 180.8032 22600 0.6061 7911072
0.7983 182.4016 22800 0.6075 7981312
0.6592 184.0 23000 0.6066 8050976
0.6686 185.6024 23200 0.6060 8121312
0.5448 187.2008 23400 0.6047 8191520
0.5868 188.8032 23600 0.6013 8261456
0.7454 190.4016 23800 0.6131 8331664
1.137 192.0 24000 0.6159 8401328
0.5008 193.6024 24200 0.6039 8471232
0.8048 195.2008 24400 0.6079 8540976
0.6897 196.8032 24600 0.6059 8611296
0.5966 198.4016 24800 0.6075 8681264
0.434 200.0 25000 0.6160 8751280
0.4255 201.6024 25200 0.6050 8822192
0.5553 203.2008 25400 0.6063 8891648
0.6894 204.8032 25600 0.6118 8961760
0.5924 206.4016 25800 0.6104 9031568
0.4732 208.0 26000 0.6030 9101088
0.7517 209.6024 26200 0.6052 9171168
0.3247 211.2008 26400 0.6049 9240752
0.5487 212.8032 26600 0.6017 9310960
0.7838 214.4016 26800 0.6027 9380560
1.0043 216.0 27000 0.6075 9450912
0.4924 217.6024 27200 0.6063 9520832
0.5188 219.2008 27400 0.6075 9590800
0.826 220.8032 27600 0.6111 9661456
0.9029 222.4016 27800 0.6089 9731376
0.5354 224.0 28000 0.6084 9801040
0.6485 225.6024 28200 0.6080 9870784
0.8221 227.2008 28400 0.6132 9941408
0.7324 228.8032 28600 0.6031 10011264
0.7633 230.4016 28800 0.6112 10080704
0.9061 232.0 29000 0.6090 10150880
0.855 233.6024 29200 0.6018 10221616
0.9609 235.2008 29400 0.6006 10291664
0.7309 236.8032 29600 0.6120 10361728
0.7132 238.4016 29800 0.6046 10431088
0.5857 240.0 30000 0.6083 10501088
0.6568 241.6024 30200 0.6097 10571488
0.8502 243.2008 30400 0.6069 10640848
0.7067 244.8032 30600 0.6096 10711136
0.5737 246.4016 30800 0.6039 10781136
0.411 248.0 31000 0.5998 10851312
0.3786 249.6024 31200 0.6112 10921664
0.8119 251.2008 31400 0.6060 10991936
0.7882 252.8032 31600 0.6012 11061680
0.7779 254.4016 31800 0.6105 11131872
0.5879 256.0 32000 0.6011 11201520
0.4562 257.6024 32200 0.6092 11271952
0.8154 259.2008 32400 0.5993 11340976
0.8513 260.8032 32600 0.6082 11411056
0.5301 262.4016 32800 0.5973 11481152
0.4274 264.0 33000 0.6082 11550752
0.7707 265.6024 33200 0.6110 11620752
0.5863 267.2008 33400 0.6022 11690464
0.6638 268.8032 33600 0.6062 11761360
0.8022 270.4016 33800 0.6082 11831152
0.4962 272.0 34000 0.6052 11900768
0.7421 273.6024 34200 0.6155 11971616
0.8621 275.2008 34400 0.6042 12041104
0.4739 276.8032 34600 0.6042 12111712
0.661 278.4016 34800 0.6115 12181328
0.5588 280.0 35000 0.6040 12251088
0.8743 281.6024 35200 0.6042 12321616
0.5744 283.2008 35400 0.6042 12391184
0.6344 284.8032 35600 0.6042 12461088
0.7548 286.4016 35800 0.6042 12531520
1.0844 288.0 36000 0.6042 12600944
0.3644 289.6024 36200 0.6042 12670544
0.7256 291.2008 36400 0.6042 12741216
0.8211 292.8032 36600 0.6042 12811584
0.6064 294.4016 36800 0.6042 12881104
0.5569 296.0 37000 0.6042 12951648
0.5618 297.6024 37200 0.6042 13021600
0.6211 299.2008 37400 0.6042 13091888
0.5256 300.8032 37600 0.6042 13162128
1.1123 302.4016 37800 0.6042 13231552
0.7682 304.0 38000 0.6042 13302080
0.6204 305.6024 38200 0.6042 13371808
0.8488 307.2008 38400 0.6042 13441936
0.947 308.8032 38600 0.6042 13512304
0.8 310.4016 38800 0.6042 13582192
0.802 312.0 39000 0.6042 13652384
0.457 313.6024 39200 0.6042 13722224
0.6368 315.2008 39400 0.6042 13791728
0.5913 316.8032 39600 0.6042 13862560
0.6218 318.4016 39800 0.6042 13933264
0.6923 320.0 40000 0.6042 14002704

Framework versions

  • PEFT 0.15.2.dev0
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_1745950303

Adapter
(2002)
this model