lemexp-task1-lemma_command_small-Llama-3.2-1B-ddp-8lr

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6416

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0008
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 12
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0.2003 461 1.0440
1.128 0.4005 922 0.9841
1.0276 0.6008 1383 0.9573
0.9831 0.8010 1844 0.9383
0.9507 1.0013 2305 0.9198
0.9196 1.2016 2766 0.8958
0.8816 1.4018 3227 0.9029
0.8802 1.6021 3688 0.8764
0.8751 1.8023 4149 0.8647
0.865 2.0026 4610 0.8444
0.8264 2.2029 5071 0.8443
0.8188 2.4031 5532 0.8321
0.8188 2.6034 5993 0.8318
0.8145 2.8036 6454 0.8210
0.8142 3.0039 6915 0.8134
0.7998 3.2042 7376 0.8051
0.7759 3.4044 7837 0.8013
0.764 3.6047 8298 0.7852
0.764 3.8050 8759 0.7829
0.7626 4.0052 9220 0.7777
0.744 4.2055 9681 0.7865
0.7304 4.4057 10142 0.7671
0.7307 4.6060 10603 0.7595
0.7313 4.8063 11064 0.7644
0.7267 5.0065 11525 0.7642
0.7267 5.2068 11986 0.7450
0.6842 5.4070 12447 0.7489
0.6944 5.6073 12908 0.7354
0.6957 5.8076 13369 0.7272
0.6855 6.0078 13830 0.7318
0.6801 6.2081 14291 0.7291
0.6528 6.4083 14752 0.7155
0.6551 6.6086 15213 0.7167
0.6576 6.8089 15674 0.7161
0.6561 7.0091 16135 0.7076
0.6262 7.2094 16596 0.7111
0.627 7.4096 17057 0.7074
0.624 7.6099 17518 0.6987
0.624 7.8102 17979 0.6931
0.6178 8.0104 18440 0.6892
0.6116 8.2107 18901 0.6908
0.5815 8.4109 19362 0.6831
0.5887 8.6112 19823 0.6758
0.5823 8.8115 20284 0.6793
0.5885 9.0117 20745 0.6718
0.5636 9.2120 21206 0.6703
0.5485 9.4123 21667 0.6666
0.5569 9.6125 22128 0.6596
0.5534 9.8128 22589 0.6519
0.5537 10.0130 23050 0.6631
0.5146 10.2133 23511 0.6657
0.5146 10.4136 23972 0.6550
0.5212 10.6138 24433 0.6490
0.5179 10.8141 24894 0.6483
0.5234 11.0143 25355 0.6498
0.5 11.2146 25816 0.6494
0.4811 11.4149 26277 0.6499
0.4849 11.6151 26738 0.6448
0.497 11.8154 27199 0.6416

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/lemexp-task1-lemma_command_small-Llama-3.2-1B-ddp-8lr

Adapter
(318)
this model