File size: 2,646 Bytes
9d6692a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
tags:
- llama
- alpaca
- grit
- Qlora
- instruction-tuning
- fine-tuned
base_model: meta-llama/Llama-3.2-3B
library_name: peft
license: apache-2.0
datasets:
- google/boolq
language:
- en
pipeline_tag: text-generation
---

# meta-llama/Llama-3.2-3B Fine-tuned with GRIT and Lora

This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) using the **GRIT** (Geometric Reprojection Instruction Tuning) algorithm and **LoRA** on the [google/boolq dataset](https://huggingface.co/datasets/google/boolq).

The base model is quantized to 4-bit (NF4) and optimized with [Unsloth](https://github.com/unslothai/unsloth) to enable efficient fine-tuning.

## πŸš€ Training Details

### GRIT Algorithm
- **K-FAC Updates**: Every 10 steps (adaptive) for second-order preconditioning.
- **Neural Reprojection**: Every 20 steps (adaptive) for rank optimization.
- **Rank Adaptation**: Enabled (Threshold: 0.99, Min Rank: 4).
- **Optimized LoRA Modules**: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']

### Fine-tuning Configuration
- **Base Model**: meta-llama/Llama-3.2-3B
- **Quantization**: 4-bit (NF4) with bf16 compute.
- **LoRA Rank**: 16
- **LoRA Alpha**: 32
- **Batch Size**: 8 (per device)
- **Gradient Accumulation**: 4 (Effective batch = 32)
- **Learning Rate**: 2.0e-05
- **Precision**: bf16 mixed precision
- **Sequence Length**: 1024 tokens
- **Gradient Checkpointing**: Enabled

### Performance Improvements
- βœ… **Faster Convergence**: K-FAC preconditioning aligns updates with curvature.
- βœ… **Adaptive Rank**: Dynamically prunes LoRA rank to improve parameter efficiency.

## πŸ“Š Training Metrics
- **Total Steps**: 295
- **Final Loss**: 0.318148963734255
- **Trainable Params**: 24,313,856

## πŸ“ Algorithm Details
- **K-FAC Preconditioning** (Natural Gradient) and **Neural Reprojection** as per GRIT method.
- **Memory Efficient**: Covariance matrices on CPU to reduce GPU load.

## πŸ† Results
In benchmark comparisons, GRIT has shown **faster convergence and better stability** than standard LoRA or fine-tuning, making it well-suited for efficient single-epoch training. The use of Unsloth further accelerates this process.

## πŸ“ Citation
If you use this model, please cite the original GRIT paper and:
```bibtex
@misc{grit-lora-Llama-3.2-3B-boolq},
  title={ meta-llama/Llama-3.2-3B Fine-tuned with GRIT on google/boolq },
  author={D1zzYzz},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/D1zzYzz/GRIT-BOOLQ-QLORA-llama-3.2-3B-Energy-0.99}
}
```

## βš–οΈ License
This model inherits the Apache 2.0 license.