train_copa_1753094179
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the copa dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1246
- Num Input Tokens Seen: 281856
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 123
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Input Tokens Seen |
0.1596 |
0.5 |
45 |
0.1958 |
14016 |
0.2404 |
1.0 |
90 |
0.1377 |
28096 |
0.1623 |
1.5 |
135 |
0.1277 |
42144 |
0.1 |
2.0 |
180 |
0.1283 |
56128 |
0.0681 |
2.5 |
225 |
0.1267 |
70272 |
0.0289 |
3.0 |
270 |
0.1246 |
84352 |
0.0638 |
3.5 |
315 |
0.1314 |
98464 |
0.0061 |
4.0 |
360 |
0.1305 |
112576 |
0.1354 |
4.5 |
405 |
0.1356 |
126624 |
0.0018 |
5.0 |
450 |
0.1401 |
140832 |
0.0111 |
5.5 |
495 |
0.1353 |
154976 |
0.0039 |
6.0 |
540 |
0.1413 |
169056 |
0.1049 |
6.5 |
585 |
0.1374 |
183200 |
0.0106 |
7.0 |
630 |
0.1402 |
197344 |
0.018 |
7.5 |
675 |
0.1404 |
211392 |
0.0021 |
8.0 |
720 |
0.1440 |
225536 |
0.0019 |
8.5 |
765 |
0.1409 |
239680 |
0.0019 |
9.0 |
810 |
0.1421 |
253696 |
0.0005 |
9.5 |
855 |
0.1449 |
267840 |
0.0235 |
10.0 |
900 |
0.1471 |
281856 |
Framework versions
- PEFT 0.15.2
- Transformers 4.51.3
- Pytorch 2.7.1+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1