File size: 16,565 Bytes
3a2774a
 
 
 
 
 
cd69de8
3a2774a
 
 
 
 
 
 
 
 
 
 
cd69de8
3a2774a
cd69de8
3a2774a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
---
library_name: peft
license: llama3
base_model: meta-llama/Meta-Llama-3-8B-Instruct
tags:
- llama-factory
- lntuning
- generated_from_trainer
model-index:
- name: train_wsc_1745950303
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# train_wsc_1745950303

This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the wsc dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5956
- Num Input Tokens Seen: 14002704

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 123
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- training_steps: 40000

### Training results

| Training Loss | Epoch    | Step  | Validation Loss | Input Tokens Seen |
|:-------------:|:--------:|:-----:|:---------------:|:-----------------:|
| 0.9367        | 1.6024   | 200   | 0.6859          | 70144             |
| 0.7729        | 3.2008   | 400   | 0.6358          | 140304            |
| 0.6178        | 4.8032   | 600   | 0.6251          | 210240            |
| 0.7354        | 6.4016   | 800   | 0.6166          | 279952            |
| 0.8205        | 8.0      | 1000  | 0.6166          | 350224            |
| 0.9947        | 9.6024   | 1200  | 0.6063          | 420256            |
| 0.8109        | 11.2008  | 1400  | 0.6140          | 490496            |
| 0.6329        | 12.8032  | 1600  | 0.6054          | 560224            |
| 0.6829        | 14.4016  | 1800  | 0.6053          | 630560            |
| 0.6086        | 16.0     | 2000  | 0.6093          | 699648            |
| 0.485         | 17.6024  | 2200  | 0.6015          | 769232            |
| 0.9604        | 19.2008  | 2400  | 0.6084          | 839344            |
| 0.6535        | 20.8032  | 2600  | 0.6110          | 909744            |
| 0.6409        | 22.4016  | 2800  | 0.6115          | 979312            |
| 0.7109        | 24.0     | 3000  | 0.6066          | 1049184           |
| 0.7251        | 25.6024  | 3200  | 0.6034          | 1119552           |
| 0.6356        | 27.2008  | 3400  | 0.6066          | 1189008           |
| 0.8557        | 28.8032  | 3600  | 0.6137          | 1259168           |
| 0.759         | 30.4016  | 3800  | 0.6130          | 1329056           |
| 0.9193        | 32.0     | 4000  | 0.6128          | 1399280           |
| 0.7954        | 33.6024  | 4200  | 0.6092          | 1469920           |
| 0.7279        | 35.2008  | 4400  | 0.6051          | 1539184           |
| 0.941         | 36.8032  | 4600  | 0.6034          | 1609648           |
| 0.9295        | 38.4016  | 4800  | 0.6008          | 1679792           |
| 0.7476        | 40.0     | 5000  | 0.6098          | 1749008           |
| 0.8862        | 41.6024  | 5200  | 0.6106          | 1818832           |
| 0.7252        | 43.2008  | 5400  | 0.6087          | 1889136           |
| 0.501         | 44.8032  | 5600  | 0.6182          | 1959008           |
| 0.4602        | 46.4016  | 5800  | 0.6046          | 2028320           |
| 0.7075        | 48.0     | 6000  | 0.6129          | 2098928           |
| 0.7795        | 49.6024  | 6200  | 0.6080          | 2168688           |
| 0.6954        | 51.2008  | 6400  | 0.6075          | 2238752           |
| 0.905         | 52.8032  | 6600  | 0.6000          | 2308816           |
| 0.8237        | 54.4016  | 6800  | 0.6067          | 2379328           |
| 0.6337        | 56.0     | 7000  | 0.6052          | 2448704           |
| 0.8776        | 57.6024  | 7200  | 0.6037          | 2519008           |
| 0.7921        | 59.2008  | 7400  | 0.6066          | 2588608           |
| 0.8712        | 60.8032  | 7600  | 0.6045          | 2659072           |
| 0.6104        | 62.4016  | 7800  | 0.6041          | 2728480           |
| 0.9738        | 64.0     | 8000  | 0.6079          | 2798720           |
| 0.6123        | 65.6024  | 8200  | 0.6013          | 2868672           |
| 0.5486        | 67.2008  | 8400  | 0.6026          | 2939312           |
| 0.4234        | 68.8032  | 8600  | 0.6083          | 3009568           |
| 0.706         | 70.4016  | 8800  | 0.6032          | 3079584           |
| 0.5217        | 72.0     | 9000  | 0.6046          | 3149680           |
| 0.4153        | 73.6024  | 9200  | 0.6172          | 3219680           |
| 0.4354        | 75.2008  | 9400  | 0.6041          | 3289472           |
| 0.6993        | 76.8032  | 9600  | 0.5956          | 3359520           |
| 0.7275        | 78.4016  | 9800  | 0.6037          | 3429568           |
| 0.5396        | 80.0     | 10000 | 0.6079          | 3499648           |
| 0.7598        | 81.6024  | 10200 | 0.6038          | 3569504           |
| 0.7379        | 83.2008  | 10400 | 0.6109          | 3639920           |
| 0.9387        | 84.8032  | 10600 | 0.6056          | 3709520           |
| 0.7098        | 86.4016  | 10800 | 0.5983          | 3779456           |
| 0.6795        | 88.0     | 11000 | 0.6039          | 3849744           |
| 0.7353        | 89.6024  | 11200 | 0.6032          | 3919984           |
| 0.6685        | 91.2008  | 11400 | 0.6080          | 3989872           |
| 0.7216        | 92.8032  | 11600 | 0.6073          | 4059568           |
| 0.8336        | 94.4016  | 11800 | 0.6013          | 4129664           |
| 0.548         | 96.0     | 12000 | 0.6024          | 4199936           |
| 0.9363        | 97.6024  | 12200 | 0.5981          | 4269952           |
| 0.6282        | 99.2008  | 12400 | 0.6110          | 4339040           |
| 0.7682        | 100.8032 | 12600 | 0.6031          | 4409680           |
| 0.9204        | 102.4016 | 12800 | 0.6103          | 4479120           |
| 0.6169        | 104.0    | 13000 | 0.6119          | 4548896           |
| 0.7145        | 105.6024 | 13200 | 0.6044          | 4619216           |
| 0.7454        | 107.2008 | 13400 | 0.6099          | 4689424           |
| 0.7114        | 108.8032 | 13600 | 0.6078          | 4759232           |
| 0.7552        | 110.4016 | 13800 | 0.6081          | 4829120           |
| 0.5361        | 112.0    | 14000 | 0.6138          | 4899024           |
| 0.6323        | 113.6024 | 14200 | 0.5998          | 4968944           |
| 0.7257        | 115.2008 | 14400 | 0.6055          | 5039152           |
| 0.5306        | 116.8032 | 14600 | 0.6010          | 5109312           |
| 0.8061        | 118.4016 | 14800 | 0.6115          | 5179296           |
| 0.7583        | 120.0    | 15000 | 0.6079          | 5249504           |
| 0.818         | 121.6024 | 15200 | 0.6016          | 5319424           |
| 0.909         | 123.2008 | 15400 | 0.6039          | 5389488           |
| 0.9621        | 124.8032 | 15600 | 0.6032          | 5459776           |
| 0.3719        | 126.4016 | 15800 | 0.6107          | 5529760           |
| 0.8277        | 128.0    | 16000 | 0.6074          | 5599968           |
| 0.5884        | 129.6024 | 16200 | 0.6056          | 5671056           |
| 0.6286        | 131.2008 | 16400 | 0.6104          | 5740000           |
| 0.6262        | 132.8032 | 16600 | 0.6098          | 5810288           |
| 0.6929        | 134.4016 | 16800 | 0.6065          | 5880176           |
| 0.6835        | 136.0    | 17000 | 0.6080          | 5950048           |
| 0.7025        | 137.6024 | 17200 | 0.6135          | 6020016           |
| 0.8546        | 139.2008 | 17400 | 0.6162          | 6090672           |
| 0.5158        | 140.8032 | 17600 | 0.6072          | 6160288           |
| 0.7597        | 142.4016 | 17800 | 0.6078          | 6230656           |
| 0.8127        | 144.0    | 18000 | 0.6005          | 6299968           |
| 0.669         | 145.6024 | 18200 | 0.6080          | 6370512           |
| 0.7968        | 147.2008 | 18400 | 0.6064          | 6440784           |
| 0.5663        | 148.8032 | 18600 | 0.6056          | 6510560           |
| 0.6785        | 150.4016 | 18800 | 0.6010          | 6579872           |
| 0.8551        | 152.0    | 19000 | 0.6024          | 6650112           |
| 0.7856        | 153.6024 | 19200 | 0.5996          | 6720368           |
| 0.5416        | 155.2008 | 19400 | 0.6072          | 6790512           |
| 0.7651        | 156.8032 | 19600 | 0.6056          | 6860880           |
| 0.6543        | 158.4016 | 19800 | 0.6175          | 6930576           |
| 0.5508        | 160.0    | 20000 | 0.6053          | 7000640           |
| 0.6528        | 161.6024 | 20200 | 0.6023          | 7070272           |
| 0.6598        | 163.2008 | 20400 | 0.5996          | 7140336           |
| 0.5761        | 164.8032 | 20600 | 0.6078          | 7210816           |
| 0.653         | 166.4016 | 20800 | 0.6016          | 7281392           |
| 0.8061        | 168.0    | 21000 | 0.6057          | 7350960           |
| 0.7621        | 169.6024 | 21200 | 0.6053          | 7421312           |
| 0.6579        | 171.2008 | 21400 | 0.6047          | 7491200           |
| 0.5762        | 172.8032 | 21600 | 0.6003          | 7560976           |
| 0.9284        | 174.4016 | 21800 | 0.6020          | 7631024           |
| 0.6199        | 176.0    | 22000 | 0.6054          | 7700784           |
| 0.7859        | 177.6024 | 22200 | 0.6110          | 7770752           |
| 0.3245        | 179.2008 | 22400 | 0.6039          | 7840832           |
| 0.7359        | 180.8032 | 22600 | 0.6061          | 7911072           |
| 0.7983        | 182.4016 | 22800 | 0.6075          | 7981312           |
| 0.6592        | 184.0    | 23000 | 0.6066          | 8050976           |
| 0.6686        | 185.6024 | 23200 | 0.6060          | 8121312           |
| 0.5448        | 187.2008 | 23400 | 0.6047          | 8191520           |
| 0.5868        | 188.8032 | 23600 | 0.6013          | 8261456           |
| 0.7454        | 190.4016 | 23800 | 0.6131          | 8331664           |
| 1.137         | 192.0    | 24000 | 0.6159          | 8401328           |
| 0.5008        | 193.6024 | 24200 | 0.6039          | 8471232           |
| 0.8048        | 195.2008 | 24400 | 0.6079          | 8540976           |
| 0.6897        | 196.8032 | 24600 | 0.6059          | 8611296           |
| 0.5966        | 198.4016 | 24800 | 0.6075          | 8681264           |
| 0.434         | 200.0    | 25000 | 0.6160          | 8751280           |
| 0.4255        | 201.6024 | 25200 | 0.6050          | 8822192           |
| 0.5553        | 203.2008 | 25400 | 0.6063          | 8891648           |
| 0.6894        | 204.8032 | 25600 | 0.6118          | 8961760           |
| 0.5924        | 206.4016 | 25800 | 0.6104          | 9031568           |
| 0.4732        | 208.0    | 26000 | 0.6030          | 9101088           |
| 0.7517        | 209.6024 | 26200 | 0.6052          | 9171168           |
| 0.3247        | 211.2008 | 26400 | 0.6049          | 9240752           |
| 0.5487        | 212.8032 | 26600 | 0.6017          | 9310960           |
| 0.7838        | 214.4016 | 26800 | 0.6027          | 9380560           |
| 1.0043        | 216.0    | 27000 | 0.6075          | 9450912           |
| 0.4924        | 217.6024 | 27200 | 0.6063          | 9520832           |
| 0.5188        | 219.2008 | 27400 | 0.6075          | 9590800           |
| 0.826         | 220.8032 | 27600 | 0.6111          | 9661456           |
| 0.9029        | 222.4016 | 27800 | 0.6089          | 9731376           |
| 0.5354        | 224.0    | 28000 | 0.6084          | 9801040           |
| 0.6485        | 225.6024 | 28200 | 0.6080          | 9870784           |
| 0.8221        | 227.2008 | 28400 | 0.6132          | 9941408           |
| 0.7324        | 228.8032 | 28600 | 0.6031          | 10011264          |
| 0.7633        | 230.4016 | 28800 | 0.6112          | 10080704          |
| 0.9061        | 232.0    | 29000 | 0.6090          | 10150880          |
| 0.855         | 233.6024 | 29200 | 0.6018          | 10221616          |
| 0.9609        | 235.2008 | 29400 | 0.6006          | 10291664          |
| 0.7309        | 236.8032 | 29600 | 0.6120          | 10361728          |
| 0.7132        | 238.4016 | 29800 | 0.6046          | 10431088          |
| 0.5857        | 240.0    | 30000 | 0.6083          | 10501088          |
| 0.6568        | 241.6024 | 30200 | 0.6097          | 10571488          |
| 0.8502        | 243.2008 | 30400 | 0.6069          | 10640848          |
| 0.7067        | 244.8032 | 30600 | 0.6096          | 10711136          |
| 0.5737        | 246.4016 | 30800 | 0.6039          | 10781136          |
| 0.411         | 248.0    | 31000 | 0.5998          | 10851312          |
| 0.3786        | 249.6024 | 31200 | 0.6112          | 10921664          |
| 0.8119        | 251.2008 | 31400 | 0.6060          | 10991936          |
| 0.7882        | 252.8032 | 31600 | 0.6012          | 11061680          |
| 0.7779        | 254.4016 | 31800 | 0.6105          | 11131872          |
| 0.5879        | 256.0    | 32000 | 0.6011          | 11201520          |
| 0.4562        | 257.6024 | 32200 | 0.6092          | 11271952          |
| 0.8154        | 259.2008 | 32400 | 0.5993          | 11340976          |
| 0.8513        | 260.8032 | 32600 | 0.6082          | 11411056          |
| 0.5301        | 262.4016 | 32800 | 0.5973          | 11481152          |
| 0.4274        | 264.0    | 33000 | 0.6082          | 11550752          |
| 0.7707        | 265.6024 | 33200 | 0.6110          | 11620752          |
| 0.5863        | 267.2008 | 33400 | 0.6022          | 11690464          |
| 0.6638        | 268.8032 | 33600 | 0.6062          | 11761360          |
| 0.8022        | 270.4016 | 33800 | 0.6082          | 11831152          |
| 0.4962        | 272.0    | 34000 | 0.6052          | 11900768          |
| 0.7421        | 273.6024 | 34200 | 0.6155          | 11971616          |
| 0.8621        | 275.2008 | 34400 | 0.6042          | 12041104          |
| 0.4739        | 276.8032 | 34600 | 0.6042          | 12111712          |
| 0.661         | 278.4016 | 34800 | 0.6115          | 12181328          |
| 0.5588        | 280.0    | 35000 | 0.6040          | 12251088          |
| 0.8743        | 281.6024 | 35200 | 0.6042          | 12321616          |
| 0.5744        | 283.2008 | 35400 | 0.6042          | 12391184          |
| 0.6344        | 284.8032 | 35600 | 0.6042          | 12461088          |
| 0.7548        | 286.4016 | 35800 | 0.6042          | 12531520          |
| 1.0844        | 288.0    | 36000 | 0.6042          | 12600944          |
| 0.3644        | 289.6024 | 36200 | 0.6042          | 12670544          |
| 0.7256        | 291.2008 | 36400 | 0.6042          | 12741216          |
| 0.8211        | 292.8032 | 36600 | 0.6042          | 12811584          |
| 0.6064        | 294.4016 | 36800 | 0.6042          | 12881104          |
| 0.5569        | 296.0    | 37000 | 0.6042          | 12951648          |
| 0.5618        | 297.6024 | 37200 | 0.6042          | 13021600          |
| 0.6211        | 299.2008 | 37400 | 0.6042          | 13091888          |
| 0.5256        | 300.8032 | 37600 | 0.6042          | 13162128          |
| 1.1123        | 302.4016 | 37800 | 0.6042          | 13231552          |
| 0.7682        | 304.0    | 38000 | 0.6042          | 13302080          |
| 0.6204        | 305.6024 | 38200 | 0.6042          | 13371808          |
| 0.8488        | 307.2008 | 38400 | 0.6042          | 13441936          |
| 0.947         | 308.8032 | 38600 | 0.6042          | 13512304          |
| 0.8           | 310.4016 | 38800 | 0.6042          | 13582192          |
| 0.802         | 312.0    | 39000 | 0.6042          | 13652384          |
| 0.457         | 313.6024 | 39200 | 0.6042          | 13722224          |
| 0.6368        | 315.2008 | 39400 | 0.6042          | 13791728          |
| 0.5913        | 316.8032 | 39600 | 0.6042          | 13862560          |
| 0.6218        | 318.4016 | 39800 | 0.6042          | 13933264          |
| 0.6923        | 320.0    | 40000 | 0.6042          | 14002704          |


### Framework versions

- PEFT 0.15.2.dev0
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1