11APRIL2025-Meta-Llama-3-8B-MEDAL-flash-attention-2-cosine-evaldata

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2896

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.15
  • num_epochs: 0.3

Training results

Training Loss Epoch Step Validation Loss
2.7516 0.0069 100 2.6728
2.5893 0.0138 200 2.5248
2.494 0.0207 300 2.4700
2.4513 0.0277 400 2.4380
2.4368 0.0346 500 2.4128
2.399 0.0415 600 2.3939
2.3878 0.0484 700 2.3791
2.3604 0.0553 800 2.3660
2.3675 0.0622 900 2.3564
2.3596 0.0692 1000 2.3493
2.3551 0.0761 1100 2.3423
2.3361 0.0830 1200 2.3366
2.3218 0.0899 1300 2.3320
2.339 0.0968 1400 2.3269
2.324 0.1037 1500 2.3235
2.3243 0.1106 1600 2.3194
2.3158 0.1176 1700 2.3161
2.3034 0.1245 1800 2.3129
2.3075 0.1314 1900 2.3104
2.3189 0.1383 2000 2.3078
2.3021 0.1452 2100 2.3053
2.2934 0.1521 2200 2.3030
2.2965 0.1590 2300 2.3012
2.3036 0.1660 2400 2.2994
2.2876 0.1729 2500 2.2980
2.2904 0.1798 2600 2.2965
2.3025 0.1867 2700 2.2952
2.306 0.1936 2800 2.2940
2.2965 0.2005 2900 2.2931
2.2919 0.2075 3000 2.2921
2.2906 0.2144 3100 2.2915
2.3053 0.2213 3200 2.2909
2.2865 0.2282 3300 2.2905
2.2924 0.2351 3400 2.2902
2.2873 0.2420 3500 2.2900
2.2762 0.2489 3600 2.2898
2.2841 0.2559 3700 2.2897
2.2917 0.2628 3800 2.2897
2.2946 0.2697 3900 2.2897
2.2844 0.2766 4000 2.2897
2.2907 0.2835 4100 2.2896
2.2875 0.2904 4200 2.2896
2.2888 0.2974 4300 2.2896

Framework versions

  • PEFT 0.14.0
  • Transformers 4.51.2
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for frankmorales2020/11APRIL2025-Meta-Llama-3-8B-MEDAL-flash-attention-2-cosine-evaldata

Adapter
(620)
this model