D1zzYzz
/

GRIT-BOOLQ-QLORA-llama-3.2-3B-Energy-0.99

Text Generation

instruction-tuning

Model card Files Files and versions

GRIT-BOOLQ-QLORA-llama-3.2-3B-Energy-0.99 / README.md

D1zzYzz's picture

GRIT fine-tuned Llama-3.2-3B on boolq

9d6692a verified about 1 month ago

|

history blame contribute delete

2.65 kB

	---
	tags:
	- llama
	- alpaca
	- grit
	- Qlora
	- instruction-tuning
	- fine-tuned
	base_model: meta-llama/Llama-3.2-3B
	library_name: peft
	license: apache-2.0
	datasets:
	- google/boolq
	language:
	- en
	pipeline_tag: text-generation
	---

	# meta-llama/Llama-3.2-3B Fine-tuned with GRIT and Lora

	This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) using the GRIT (Geometric Reprojection Instruction Tuning) algorithm and LoRA on the [google/boolq dataset](https://huggingface.co/datasets/google/boolq).

	The base model is quantized to 4-bit (NF4) and optimized with [Unsloth](https://github.com/unslothai/unsloth) to enable efficient fine-tuning.

	## 🚀 Training Details

	### GRIT Algorithm
	- K-FAC Updates: Every 10 steps (adaptive) for second-order preconditioning.
	- Neural Reprojection: Every 20 steps (adaptive) for rank optimization.
	- Rank Adaptation: Enabled (Threshold: 0.99, Min Rank: 4).
	- Optimized LoRA Modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']

	### Fine-tuning Configuration
	- Base Model: meta-llama/Llama-3.2-3B
	- Quantization: 4-bit (NF4) with bf16 compute.
	- LoRA Rank: 16
	- LoRA Alpha: 32
	- Batch Size: 8 (per device)
	- Gradient Accumulation: 4 (Effective batch = 32)
	- Learning Rate: 2.0e-05
	- Precision: bf16 mixed precision
	- Sequence Length: 1024 tokens
	- Gradient Checkpointing: Enabled

	### Performance Improvements
	- ✅ Faster Convergence: K-FAC preconditioning aligns updates with curvature.
	- ✅ Adaptive Rank: Dynamically prunes LoRA rank to improve parameter efficiency.

	## 📊 Training Metrics
	- Total Steps: 295
	- Final Loss: 0.318148963734255
	- Trainable Params: 24,313,856

	## 📝 Algorithm Details
	- K-FAC Preconditioning (Natural Gradient) and Neural Reprojection as per GRIT method.
	- Memory Efficient: Covariance matrices on CPU to reduce GPU load.

	## 🏆 Results
	In benchmark comparisons, GRIT has shown faster convergence and better stability than standard LoRA or fine-tuning, making it well-suited for efficient single-epoch training. The use of Unsloth further accelerates this process.

	## 📝 Citation
	If you use this model, please cite the original GRIT paper and:
	```bibtex
	@misc{grit-lora-Llama-3.2-3B-boolq},
	title={ meta-llama/Llama-3.2-3B Fine-tuned with GRIT on google/boolq },
	author={D1zzYzz},
	year={2025},
	publisher={Hugging Face},
	url={https://huggingface.co/D1zzYzz/GRIT-BOOLQ-QLORA-llama-3.2-3B-Energy-0.99}
	}
	```

	## ⚖️ License
	This model inherits the Apache 2.0 license.