|
--- |
|
tags: |
|
- llama |
|
- alpaca |
|
- grit |
|
- Qlora |
|
- instruction-tuning |
|
- fine-tuned |
|
base_model: meta-llama/Llama-3.2-3B |
|
library_name: peft |
|
license: apache-2.0 |
|
datasets: |
|
- google/boolq |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# meta-llama/Llama-3.2-3B Fine-tuned with GRIT and Lora |
|
|
|
This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) using the **GRIT** (Geometric Reprojection Instruction Tuning) algorithm and **LoRA** on the [google/boolq dataset](https://huggingface.co/datasets/google/boolq). |
|
|
|
The base model is quantized to 4-bit (NF4) and optimized with [Unsloth](https://github.com/unslothai/unsloth) to enable efficient fine-tuning. |
|
|
|
## π Training Details |
|
|
|
### GRIT Algorithm |
|
- **K-FAC Updates**: Every 10 steps (adaptive) for second-order preconditioning. |
|
- **Neural Reprojection**: Every 20 steps (adaptive) for rank optimization. |
|
- **Rank Adaptation**: Enabled (Threshold: 0.99, Min Rank: 4). |
|
- **Optimized LoRA Modules**: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'] |
|
|
|
### Fine-tuning Configuration |
|
- **Base Model**: meta-llama/Llama-3.2-3B |
|
- **Quantization**: 4-bit (NF4) with bf16 compute. |
|
- **LoRA Rank**: 16 |
|
- **LoRA Alpha**: 32 |
|
- **Batch Size**: 8 (per device) |
|
- **Gradient Accumulation**: 4 (Effective batch = 32) |
|
- **Learning Rate**: 2.0e-05 |
|
- **Precision**: bf16 mixed precision |
|
- **Sequence Length**: 1024 tokens |
|
- **Gradient Checkpointing**: Enabled |
|
|
|
### Performance Improvements |
|
- β
**Faster Convergence**: K-FAC preconditioning aligns updates with curvature. |
|
- β
**Adaptive Rank**: Dynamically prunes LoRA rank to improve parameter efficiency. |
|
|
|
## π Training Metrics |
|
- **Total Steps**: 295 |
|
- **Final Loss**: 0.318148963734255 |
|
- **Trainable Params**: 24,313,856 |
|
|
|
## π Algorithm Details |
|
- **K-FAC Preconditioning** (Natural Gradient) and **Neural Reprojection** as per GRIT method. |
|
- **Memory Efficient**: Covariance matrices on CPU to reduce GPU load. |
|
|
|
## π Results |
|
In benchmark comparisons, GRIT has shown **faster convergence and better stability** than standard LoRA or fine-tuning, making it well-suited for efficient single-epoch training. The use of Unsloth further accelerates this process. |
|
|
|
## π Citation |
|
If you use this model, please cite the original GRIT paper and: |
|
```bibtex |
|
@misc{grit-lora-Llama-3.2-3B-boolq}, |
|
title={ meta-llama/Llama-3.2-3B Fine-tuned with GRIT on google/boolq }, |
|
author={D1zzYzz}, |
|
year={2025}, |
|
publisher={Hugging Face}, |
|
url={https://huggingface.co/D1zzYzz/GRIT-BOOLQ-QLORA-llama-3.2-3B-Energy-0.99} |
|
} |
|
``` |
|
|
|
## βοΈ License |
|
This model inherits the Apache 2.0 license. |
|
|