llama_3_3_20250903_2145
This model is a fine-tuned version of meta-llama/Llama-3.2-3B on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.3355
- Map@3: 0.9371
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Map@3 |
---|---|---|---|---|
17.9232 | 0.0523 | 20 | 1.4365 | 0.7168 |
10.0651 | 0.1046 | 40 | 1.1210 | 0.7636 |
9.1342 | 0.1569 | 60 | 1.0630 | 0.7616 |
8.7455 | 0.2092 | 80 | 1.0319 | 0.7732 |
8.0814 | 0.2615 | 100 | 0.9055 | 0.8084 |
7.333 | 0.3138 | 120 | 0.8242 | 0.8219 |
6.8603 | 0.3661 | 140 | 0.8413 | 0.8197 |
6.3616 | 0.4184 | 160 | 0.8386 | 0.8224 |
7.267 | 0.4707 | 180 | 0.8070 | 0.8276 |
5.946 | 0.5230 | 200 | 0.7488 | 0.8428 |
6.3872 | 0.5754 | 220 | 0.7623 | 0.8343 |
5.9969 | 0.6277 | 240 | 0.6821 | 0.8597 |
5.544 | 0.6800 | 260 | 0.6512 | 0.8564 |
4.8356 | 0.7323 | 280 | 0.6462 | 0.8709 |
5.6033 | 0.7846 | 300 | 0.5858 | 0.8815 |
4.4918 | 0.8369 | 320 | 0.5837 | 0.8849 |
4.9479 | 0.8892 | 340 | 0.5603 | 0.8880 |
4.5659 | 0.9415 | 360 | 0.5243 | 0.8932 |
4.3615 | 0.9938 | 380 | 0.5798 | 0.8881 |
4.3143 | 1.0445 | 400 | 0.4902 | 0.8994 |
3.6791 | 1.0968 | 420 | 0.5078 | 0.8991 |
3.5985 | 1.1491 | 440 | 0.4904 | 0.9047 |
3.5077 | 1.2014 | 460 | 0.4797 | 0.9075 |
3.843 | 1.2537 | 480 | 0.4635 | 0.9085 |
3.3767 | 1.3060 | 500 | 0.4548 | 0.9116 |
3.8554 | 1.3583 | 520 | 0.4823 | 0.9043 |
3.8529 | 1.4106 | 540 | 0.4927 | 0.9032 |
3.4666 | 1.4629 | 560 | 0.4424 | 0.9138 |
3.6173 | 1.5152 | 580 | 0.4326 | 0.9160 |
3.3832 | 1.5675 | 600 | 0.4243 | 0.9176 |
2.7451 | 1.6198 | 620 | 0.4521 | 0.9183 |
2.9097 | 1.6721 | 640 | 0.3975 | 0.9219 |
3.2222 | 1.7244 | 660 | 0.3934 | 0.9229 |
3.2087 | 1.7767 | 680 | 0.4234 | 0.9186 |
2.9231 | 1.8290 | 700 | 0.3970 | 0.9211 |
2.7208 | 1.8813 | 720 | 0.3943 | 0.9211 |
2.9979 | 1.9336 | 740 | 0.3821 | 0.9246 |
2.9678 | 1.9859 | 760 | 0.3680 | 0.9301 |
2.501 | 2.0366 | 780 | 0.3765 | 0.9271 |
2.202 | 2.0889 | 800 | 0.3723 | 0.9302 |
1.8267 | 2.1412 | 820 | 0.3923 | 0.9260 |
2.313 | 2.1935 | 840 | 0.3710 | 0.9307 |
2.0693 | 2.2458 | 860 | 0.3658 | 0.9299 |
2.0435 | 2.2981 | 880 | 0.3746 | 0.9307 |
1.9854 | 2.3504 | 900 | 0.4199 | 0.9277 |
2.0134 | 2.4027 | 920 | 0.3675 | 0.9324 |
1.7272 | 2.4551 | 940 | 0.3662 | 0.9314 |
1.8824 | 2.5074 | 960 | 0.3755 | 0.9309 |
1.8695 | 2.5597 | 980 | 0.3588 | 0.9340 |
1.9778 | 2.6120 | 1000 | 0.3511 | 0.9356 |
1.8434 | 2.6643 | 1020 | 0.3617 | 0.9341 |
1.7754 | 2.7166 | 1040 | 0.3491 | 0.9350 |
1.9125 | 2.7689 | 1060 | 0.3446 | 0.9350 |
1.728 | 2.8212 | 1080 | 0.3439 | 0.9367 |
1.9307 | 2.8735 | 1100 | 0.3379 | 0.9364 |
1.828 | 2.9258 | 1120 | 0.3362 | 0.9373 |
1.4855 | 2.9781 | 1140 | 0.3355 | 0.9371 |
Framework versions
- PEFT 0.17.1
- Transformers 4.56.0
- Pytorch 2.8.0+cu126
- Datasets 4.0.0
- Tokenizers 0.22.0
- Downloads last month
- 13
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for prl90777/llama_3_3_20250903_2145
Base model
meta-llama/Llama-3.2-3B