See axolotl config

axolotl version: 0.13.0.dev0

# llama-8B-training.yaml
# =========================
# Model Configuration
# =========================
base_model: meta-llama/Llama-3.1-8B-Instruct
load_in_4bit: true                                # Use 4-bit quantization (saves VRAM on smaller GPUs like A100 40GB or L4)
adapter: qlora
bnb_4bit_use_double_quant: true                   # recommended for stability
bnb_4bit_quant_type: nf4
bnb_4bit_compute_dtype: bfloat16                      # compute in bf16
trust_remote_code: true                           # Allow loading models with custom HF code
tokenizer_name: meta-llama/Llama-3.1-8B-Instruct
tokenizer_use_fast: true                          # Faster tokenization

# =========================
# Dataset Configuration
# =========================
datasets:
  - path: Ivoyant/attr-mappings-training-v2
    split: train
    type: chat_template
    chat_template: llama3                    # Use built-in Llama 3 chat template
    field_messages: conversations            # Column containing conversation array
    
    # Optional: Control which roles to train on (default: assistant only)
    roles_to_train: ["assistant"]
    
    # Optional: Control EOS token training
    train_on_eos: turn  # Options: "turn", "all", "last"

# val_set_size: 0.1

test_datasets:
  - path: Ivoyant/attr-mappings-training-v2
    split: validation
    type: chat_template
    chat_template: llama3                    # Use built-in Llama 3 chat template
    field_messages: conversations            # Column containing conversation array
    
    # Optional: Control which roles to train on (default: assistant only)
    roles_to_train: ["assistant"]
    
    # Optional: Control EOS token training
    train_on_eos: turn  # Options: "turn", "all", "last"

seed: 42  # Ensures reproducible splits

dataset_prepared_path: /workspace/data/prepared_dataset_v2

# =========================
# LoRA Configuration
# =========================
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj
lora_fan_in_fan_out: false

# =========================
# Training Configuration
# =========================
micro_batch_size: 2
gradient_accumulation_steps: 8        # simulates batch size of 8
learning_rate: 5e-5                   # standard LoRA LR
num_epochs: 8
lr_scheduler: cosine                   # smooth decay
warmup_steps: 100                     # Add warmup for stability
save_strategy: steps
save_steps: 500
# saves_per_epoch: 1
# evals_per_epoch: 1
eval_strategy: steps                       # Evaluate more frequently
eval_steps: 50
save_total_limit: 3                   # Keep more checkpoints for experimentation
bf16: true                            # A40 supports BF16
fp16: false                           # don't mix with bf16
optim: adamw_torch
gradient_checkpointing: true          # saves VRAM at cost of compute
max_grad_norm: 1.0
weight_decay: 0.01
dataloader_num_workers: 2

# =========================
# Sequence Configuration
# =========================
sequence_len: 768
sample_packing: true
pad_to_sequence_len: true

special_tokens:
  pad_token: "<|eot_id|>"
  eos_token: "<|eot_id|>"

# =========================
# Output & Logging Configuration
# =========================
output_dir: /workspace/data/outputs/lora-llama-8b-activity-mappings_v2
logging_steps: 50
use_tensorboard: true
logging_strategy: steps

# =========================
# Memory & Performance Optimization
# =========================
dataloader_pin_memory: true           # ✅ usually better perf unless CPU RAM issue
remove_unused_columns: true           # ✅ recommended by reference

# Early stopping for efficiency
early_stopping_patience: 3
load_best_model_at_end: true
metric_for_best_model: eval_loss
greater_is_better: false

workspace/data/outputs/lora-llama-8b-activity-mappings_v2

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the Ivoyant/attr-mappings-training-v2 dataset. It achieves the following results on the evaluation set:

Loss: 0.0405
Memory/max Active (gib): 7.8
Memory/max Allocated (gib): 7.8
Memory/device Reserved (gib): 9.25

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1920

Training results

Training Loss	Epoch	Step	Validation Loss	Active (gib)	Allocated (gib)	Reserved (gib)
No log	0	0	1.6090	7.48	7.48	9.71
1.2574	0.2082	50	0.5494	7.8	7.8	9.52
0.3452	0.4164	100	0.1971	7.8	7.8	9.25
0.1527	0.6247	150	0.1101	7.8	7.8	9.25
0.0954	0.8329	200	0.0801	7.8	7.8	9.25
0.0723	1.0375	250	0.0698	7.8	7.8	9.25
0.06	1.2457	300	0.0665	7.8	7.8	9.25
0.0529	1.4539	350	0.0555	7.8	7.8	9.25
0.0452	1.6622	400	0.0524	7.8	7.8	9.25
0.0456	1.8704	450	0.0470	7.8	7.8	9.25
0.0417	2.0750	500	0.0422	7.8	7.8	9.25
0.0283	2.2832	550	0.0418	7.8	7.8	9.25
0.0318	2.4914	600	0.0417	7.8	7.8	9.25
0.0328	2.6996	650	0.0426	7.8	7.8	9.25
0.0269	2.9079	700	0.0396	7.8	7.8	9.25
0.0247	3.1124	750	0.0413	7.8	7.8	9.25
0.0199	3.3207	800	0.0405	7.8	7.8	9.25
0.0207	3.5289	850	0.0405	7.8	7.8	9.25

Framework versions

PEFT 0.17.1
Transformers 4.56.1
Pytorch 2.7.1+cu126
Datasets 4.0.0
Tokenizers 0.22.1

Downloads last month: 10

Model tree for Ivoyant/attr-mappings-llama-3.1-8b-lora-r16-v2

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1183)

this model

Dataset used to train Ivoyant/attr-mappings-llama-3.1-8b-lora-r16-v2

Evaluation results

Metadata error: specify a dataset to view leaderboard