Llama3-8B-lora-r-32-generic-step-1200-lr-1e-5-labels_40.0-1

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9038

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • training_steps: 1200
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
5.0258 1.0870 50 4.1291
3.728 2.1739 100 3.4327
3.2554 3.2609 150 3.1749
3.0389 4.3478 200 3.0403
2.8709 5.4348 250 2.9543
2.7513 6.5217 300 2.8973
2.6369 7.6087 350 2.8526
2.5456 8.6957 400 2.8278
2.4591 9.7826 450 2.8082
2.3865 10.8696 500 2.8015
2.3214 11.9565 550 2.8006
2.2617 13.0435 600 2.8072
2.203 14.1304 650 2.8272
2.1612 15.2174 700 2.8441
2.1271 16.3043 750 2.8511
2.075 17.3913 800 2.8676
2.0602 18.4783 850 2.8769
2.0296 19.5652 900 2.8869
2.0106 20.6522 950 2.8915
2.0026 21.7391 1000 2.8979
1.9941 22.8261 1050 2.9038

Framework versions

  • PEFT 0.15.2
  • Transformers 4.45.2
  • Pytorch 2.5.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.20.3
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Siqi-Hu/Llama3-8B-lora-r-32-generic-step-1200-lr-1e-5-labels_40.0-1

Adapter
(643)
this model