Llama for Finance

A financial domain instruction-tuned Llama-3 model using LoRA on the Finance-Instruct-500k dataset.

Model Details

Base Model: meta-llama/Meta-Llama-3.1-8B-Instruct
Training: LoRA fine-tuning
Domain: Finance, Economics, Investment
Language: English
Context Length: 512 tokens (training max_length)
Training Data: Josephgflowers/Finance-Instruct-500k
Evaluation: Held-out test + FinanceBench

Training Configuration

Quantization: 8-bit quantization
Batch Size: 2 per device
Gradient Accumulation Steps: 8
Learning Rate: 2e-4
Number of Epochs: 1
Evaluation Steps: 50
Save Steps: 100
Logging Steps: 25

LoRA Parameters

Target Modules:
- Attention: q_proj, k_proj, v_proj, o_proj
- MLP: gate_proj, up_proj, down_proj
Rank (r): 16
Alpha: 32
Dropout: 0.1

Optimization Details

Precision: BF16 (if available) or FP16
Gradient Checkpointing: Enabled
Scheduler: Cosine with warmup (ratio: 0.03)
Weight Decay: 0.01
Max Gradient Norm: 1.0
Data Loading: 2 workers, pinned memory

Usage

This is a LoRA adapter for Llama-3. You need access to the base model.

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
model = PeftModel.from_pretrained(base_model, "TimberGu/Llama_for_Finance")
tokenizer = AutoTokenizer.from_pretrained("TimberGu/Llama_for_Finance")

Evaluation Results

The model has been evaluated on:

Held-out test set from Finance-Instruct-500k
FinanceBench open-book QA benchmark

See test_results.json for detailed metrics including:

BLEU scores
ROUGE-1/2/L scores
Perplexity

Limitations

Requires access to Meta's Llama-3 base model, make sure your hardware has enough memory to load the model
Performance may vary on non-financial topics
Should not be used as sole source for financial decisions
Training context length limited to 512 tokens because of limited GPU memory

TimberGu
/

Llama_for_Finance