Built with Axolotl

See axolotl config

axolotl version: 0.13.0.dev0

# llama-8B-training.yaml
# =========================
# Model Configuration
# =========================
base_model: meta-llama/Llama-3.1-8B-Instruct
load_in_4bit: true                                # Use 4-bit quantization (saves VRAM on smaller GPUs like A100 40GB or L4)
adapter: qlora
bnb_4bit_use_double_quant: true                   # recommended for stability
bnb_4bit_quant_type: nf4
bnb_4bit_compute_dtype: bfloat16                      # compute in bf16
trust_remote_code: true                           # Allow loading models with custom HF code
tokenizer_name: meta-llama/Llama-3.1-8B-Instruct
tokenizer_use_fast: true                          # Faster tokenization

# =========================
# Dataset Configuration
# =========================
datasets:
  - path: Ivoyant/attr-mappings-training-v2
    split: train
    type: chat_template
    chat_template: llama3                    # Use built-in Llama 3 chat template
    field_messages: conversations            # Column containing conversation array
    
    # Optional: Control which roles to train on (default: assistant only)
    roles_to_train: ["assistant"]
    
    # Optional: Control EOS token training
    train_on_eos: turn  # Options: "turn", "all", "last"

# val_set_size: 0.1

test_datasets:
  - path: Ivoyant/attr-mappings-training-v2
    split: validation
    type: chat_template
    chat_template: llama3                    # Use built-in Llama 3 chat template
    field_messages: conversations            # Column containing conversation array
    
    # Optional: Control which roles to train on (default: assistant only)
    roles_to_train: ["assistant"]
    
    # Optional: Control EOS token training
    train_on_eos: turn  # Options: "turn", "all", "last"

seed: 42  # Ensures reproducible splits

dataset_prepared_path: /workspace/data/prepared_dataset_v2

# =========================
# LoRA Configuration
# =========================
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj
lora_fan_in_fan_out: false

# =========================
# Training Configuration
# =========================
micro_batch_size: 2
gradient_accumulation_steps: 8        # simulates batch size of 8
learning_rate: 5e-5                   # standard LoRA LR
num_epochs: 8
lr_scheduler: cosine                   # smooth decay
warmup_steps: 100                     # Add warmup for stability
save_strategy: steps
save_steps: 500
# saves_per_epoch: 1
# evals_per_epoch: 1
eval_strategy: steps                       # Evaluate more frequently
eval_steps: 50
save_total_limit: 3                   # Keep more checkpoints for experimentation
bf16: true                            # A40 supports BF16
fp16: false                           # don't mix with bf16
optim: adamw_torch
gradient_checkpointing: true          # saves VRAM at cost of compute
max_grad_norm: 1.0
weight_decay: 0.01
dataloader_num_workers: 2

# =========================
# Sequence Configuration
# =========================
sequence_len: 768
sample_packing: true
pad_to_sequence_len: true

special_tokens:
  pad_token: "<|eot_id|>"
  eos_token: "<|eot_id|>"

# =========================
# Output & Logging Configuration
# =========================
output_dir: /workspace/data/outputs/lora-llama-8b-activity-mappings_v2
logging_steps: 50
use_tensorboard: true
logging_strategy: steps

# =========================
# Memory & Performance Optimization
# =========================
dataloader_pin_memory: true           # ✅ usually better perf unless CPU RAM issue
remove_unused_columns: true           # ✅ recommended by reference

# Early stopping for efficiency
early_stopping_patience: 3
load_best_model_at_end: true
metric_for_best_model: eval_loss
greater_is_better: false

workspace/data/outputs/lora-llama-8b-activity-mappings_v2

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the Ivoyant/attr-mappings-training-v2 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0405
  • Memory/max Active (gib): 7.8
  • Memory/max Allocated (gib): 7.8
  • Memory/device Reserved (gib): 9.25

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1920

Training results

Training Loss Epoch Step Validation Loss Active (gib) Allocated (gib) Reserved (gib)
No log 0 0 1.6090 7.48 7.48 9.71
1.2574 0.2082 50 0.5494 7.8 7.8 9.52
0.3452 0.4164 100 0.1971 7.8 7.8 9.25
0.1527 0.6247 150 0.1101 7.8 7.8 9.25
0.0954 0.8329 200 0.0801 7.8 7.8 9.25
0.0723 1.0375 250 0.0698 7.8 7.8 9.25
0.06 1.2457 300 0.0665 7.8 7.8 9.25
0.0529 1.4539 350 0.0555 7.8 7.8 9.25
0.0452 1.6622 400 0.0524 7.8 7.8 9.25
0.0456 1.8704 450 0.0470 7.8 7.8 9.25
0.0417 2.0750 500 0.0422 7.8 7.8 9.25
0.0283 2.2832 550 0.0418 7.8 7.8 9.25
0.0318 2.4914 600 0.0417 7.8 7.8 9.25
0.0328 2.6996 650 0.0426 7.8 7.8 9.25
0.0269 2.9079 700 0.0396 7.8 7.8 9.25
0.0247 3.1124 750 0.0413 7.8 7.8 9.25
0.0199 3.3207 800 0.0405 7.8 7.8 9.25
0.0207 3.5289 850 0.0405 7.8 7.8 9.25

Framework versions

  • PEFT 0.17.1
  • Transformers 4.56.1
  • Pytorch 2.7.1+cu126
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ivoyant/attr-mappings-llama-3.1-8b-lora-r16-v2

Adapter
(1183)
this model

Dataset used to train Ivoyant/attr-mappings-llama-3.1-8b-lora-r16-v2