File size: 2,646 Bytes
9d6692a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
---
tags:
- llama
- alpaca
- grit
- Qlora
- instruction-tuning
- fine-tuned
base_model: meta-llama/Llama-3.2-3B
library_name: peft
license: apache-2.0
datasets:
- google/boolq
language:
- en
pipeline_tag: text-generation
---
# meta-llama/Llama-3.2-3B Fine-tuned with GRIT and Lora
This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) using the **GRIT** (Geometric Reprojection Instruction Tuning) algorithm and **LoRA** on the [google/boolq dataset](https://huggingface.co/datasets/google/boolq).
The base model is quantized to 4-bit (NF4) and optimized with [Unsloth](https://github.com/unslothai/unsloth) to enable efficient fine-tuning.
## π Training Details
### GRIT Algorithm
- **K-FAC Updates**: Every 10 steps (adaptive) for second-order preconditioning.
- **Neural Reprojection**: Every 20 steps (adaptive) for rank optimization.
- **Rank Adaptation**: Enabled (Threshold: 0.99, Min Rank: 4).
- **Optimized LoRA Modules**: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']
### Fine-tuning Configuration
- **Base Model**: meta-llama/Llama-3.2-3B
- **Quantization**: 4-bit (NF4) with bf16 compute.
- **LoRA Rank**: 16
- **LoRA Alpha**: 32
- **Batch Size**: 8 (per device)
- **Gradient Accumulation**: 4 (Effective batch = 32)
- **Learning Rate**: 2.0e-05
- **Precision**: bf16 mixed precision
- **Sequence Length**: 1024 tokens
- **Gradient Checkpointing**: Enabled
### Performance Improvements
- β
**Faster Convergence**: K-FAC preconditioning aligns updates with curvature.
- β
**Adaptive Rank**: Dynamically prunes LoRA rank to improve parameter efficiency.
## π Training Metrics
- **Total Steps**: 295
- **Final Loss**: 0.318148963734255
- **Trainable Params**: 24,313,856
## π Algorithm Details
- **K-FAC Preconditioning** (Natural Gradient) and **Neural Reprojection** as per GRIT method.
- **Memory Efficient**: Covariance matrices on CPU to reduce GPU load.
## π Results
In benchmark comparisons, GRIT has shown **faster convergence and better stability** than standard LoRA or fine-tuning, making it well-suited for efficient single-epoch training. The use of Unsloth further accelerates this process.
## π Citation
If you use this model, please cite the original GRIT paper and:
```bibtex
@misc{grit-lora-Llama-3.2-3B-boolq},
title={ meta-llama/Llama-3.2-3B Fine-tuned with GRIT on google/boolq },
author={D1zzYzz},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/D1zzYzz/GRIT-BOOLQ-QLORA-llama-3.2-3B-Energy-0.99}
}
```
## βοΈ License
This model inherits the Apache 2.0 license.
|