--- license: llama3.1 base_model: meta-llama/Llama-3.1-8B-Instruct tags: - alignment-handbook - generated_from_trainer datasets: - meng-lab/Llama-3.1-8B-Instruct-xsum model-index: - name: Llama-3.1-8B-Instruct-sft-5e-3-epoch-100-xsum results: [] --- [Visualize in Weights & Biases](https://wandb.ai/uva-llm/huggingface/runs/0cft2k89) # Llama-3.1-8B-Instruct-sft-5e-3-epoch-100-xsum This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the meng-lab/Llama-3.1-8B-Instruct-xsum dataset. It achieves the following results on the evaluation set: - Loss: 6.7117 - Loss Layer 4 Head: 1.7377 - Loss Layer 8 Head: 1.4957 - Loss Layer 12 Head: 1.4384 - Loss Layer 16 Head: 0.9421 - Loss Layer 20 Head: 0.5804 - Loss Layer 24 Head: 0.3724 - Loss Layer 28 Head: 0.1958 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.005 - train_batch_size: 1 - eval_batch_size: 2 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 32 - total_train_batch_size: 128 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 100 ### Training results | Training Loss | Epoch | Step | Validation Loss | Loss Layer 4 Head | Loss Layer 8 Head | Loss Layer 12 Head | Loss Layer 16 Head | Loss Layer 20 Head | Loss Layer 24 Head | Loss Layer 28 Head | |:-------------:|:-------:|:----:|:---------------:|:-----------------:|:-----------------:|:------------------:|:------------------:|:------------------:|:------------------:|:------------------:| | 9.417 | 9.5522 | 200 | 10.6034 | 2.1800 | 2.1484 | 1.8370 | 1.5560 | 0.8850 | 0.7908 | 1.1904 | | 7.0666 | 19.1045 | 400 | 8.3242 | 2.0259 | 1.8363 | 1.7901 | 1.0876 | 0.8469 | 0.4822 | 0.2917 | | 6.5999 | 28.6567 | 600 | 7.8689 | 1.9122 | 1.7362 | 1.7044 | 1.0472 | 0.6722 | 0.4620 | 0.3698 | | 5.8586 | 38.2090 | 800 | 7.5812 | 2.0916 | 1.5734 | 1.6211 | 1.0056 | 0.6192 | 0.4660 | 0.2400 | | 5.4725 | 47.7612 | 1000 | 7.0153 | 1.8457 | 1.5162 | 1.4691 | 0.9794 | 0.6236 | 0.3980 | 0.2260 | | 5.3026 | 57.3134 | 1200 | 7.0204 | 1.9164 | 1.5058 | 1.5172 | 0.9522 | 0.5897 | 0.3804 | 0.2035 | | 4.9989 | 66.8657 | 1400 | 6.7446 | 1.7458 | 1.5005 | 1.4430 | 0.9468 | 0.5843 | 0.3757 | 0.1990 | | 4.9163 | 76.4179 | 1600 | 6.7228 | 1.7406 | 1.4972 | 1.4401 | 0.9436 | 0.5816 | 0.3734 | 0.1968 | | 4.9194 | 85.9701 | 1800 | 6.7132 | 1.7381 | 1.4960 | 1.4385 | 0.9424 | 0.5807 | 0.3726 | 0.1959 | | 4.9063 | 95.5224 | 2000 | 6.7117 | 1.7377 | 1.4957 | 1.4384 | 0.9421 | 0.5804 | 0.3724 | 0.1958 | ### Framework versions - Transformers 4.43.2 - Pytorch 2.4.1+cu121 - Datasets 3.0.1 - Tokenizers 0.19.1