meta-llama/Llama-3.2-3B Fine-tuned with GRIT and QLoRA
This model is a fine-tuned version of meta-llama/Llama-3.2-3B using the GRIT (Geometric Reprojection Instruction Tuning) algorithm and QLoRA on the databricks/databricks-dolly-15k dataset.
The base model is quantized to 4-bit (NF4) to enable efficient fine-tuning.
π Training Details
GRIT Algorithm
- K-FAC Updates: Every 50 steps (adaptive) for second-order preconditioning.
- Neural Reprojection: Every 50 steps (adaptive) for rank optimization.
- Rank Adaptation: Enabled (Threshold: 0.9, Min Rank: 4).
- Optimized LoRA Modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj']
Fine-tuning Configuration
- Base Model: meta-llama/Llama-3.2-3B
- Quantization: 4-bit (NF4) with bf16 compute.
- LoRA Rank: 16
- LoRA Alpha: 32
- Batch Size: 8 (per device)
- Gradient Accumulation: 4 (Effective batch = 32)
- Learning Rate: 2.0e-05
- Precision: bf16 mixed precision
- Sequence Length: 1024 tokens
- Gradient Checkpointing: Enabled
Performance Improvements
- β Faster Convergence: K-FAC preconditioning aligns updates with curvature.
- β Memory-Efficient: 4-bit quantization (QLoRA) and gradient checkpointing used.
- β Adaptive Rank: Dynamically prunes LoRA rank to improve parameter efficiency.
π Training Metrics
- Total Steps: 423
- Final Loss: 0.43427316291394247
- Trainable Params: 9,175,040
π Algorithm Details
- K-FAC Preconditioning (Natural Gradient) and Neural Reprojection as per GRIT method.
- Memory Efficient: Covariance matrices on CPU to reduce GPU load.
π Results
In benchmark comparisons, GRIT has shown faster convergence and better stability than standard LoRA or fine-tuning, making it well-suited for efficient single-epoch training. The use of Unsloth further accelerates this process.
π Citation
If you use this model, please cite the original GRIT paper and:
@misc{grit-lora-Llama-3.2-3B-databricks-dolly-15k},
title={ meta-llama/Llama-3.2-3B Fine-tuned with GRIT on databricks/databricks-dolly-15k },
author={te4bag},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/te4bag/GRIT-Full-databricks-llama-3.2-3B-Energy-0.9}
}
βοΈ License
This model inherits the Apache 2.0 license.
- Downloads last month
- 42
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for te4bag/GRIT-Full-databricks-llama-3.2-3B-Energy-0.9
Base model
meta-llama/Llama-3.2-3B