YAML Metadata Warning: The pipeline tag "Sequence Classification" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

meta-llama/Llama-3.2-3B Fine-tuned with GRIT and QLoRA

This model is a fine-tuned version of meta-llama/Llama-3.2-3B using the GRIT (Geometric Reprojection Instruction Tuning) algorithm and QLoRA on the nyu-mll/glue dataset.

The base model is quantized to 4-bit (NF4) to enable efficient fine-tuning.

🚀 Training Details

GRIT Algorithm

K-FAC Updates: Every 100 steps (adaptive) for second-order preconditioning.
Neural Reprojection: Every 100 steps (adaptive) for rank optimization.
Rank Adaptation: Enabled (Threshold: 0.9, Min Rank: 4).
Optimized LoRA Modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj']

Fine-tuning Configuration

Base Model: meta-llama/Llama-3.2-3B
Quantization: 4-bit (NF4) with bf16 compute.
LoRA Rank: 16
LoRA Alpha: 32
Batch Size: 16 (per device)
Gradient Accumulation: 4 (Effective batch = 64)
Learning Rate: 2.0e-05
Precision: bf16 mixed precision
Sequence Length: 256 tokens
Gradient Checkpointing: Enabled

Performance Improvements

✅ Faster Convergence: K-FAC preconditioning aligns updates with curvature.
✅ Memory-Efficient: 4-bit quantization (QLoRA) and gradient checkpointing used.
✅ Adaptive Rank: Dynamically prunes LoRA rank to improve parameter efficiency.

📊 Training Metrics

Total Steps: 1637
Final Loss: N/A
Trainable Params: 9,181,184

📝 Algorithm Details

K-FAC Preconditioning (Natural Gradient) and Neural Reprojection as per GRIT method.
Memory Efficient: Covariance matrices on CPU to reduce GPU load.

🏆 Results

In benchmark comparisons, GRIT has shown faster convergence and better stability than standard LoRA or fine-tuning, making it well-suited for efficient single-epoch training. The use of Unsloth further accelerates this process.

📝 Citation

If you use this model, please cite the original GRIT paper and:

@misc{grit-lora-Llama-3.2-3B-glue},
  title={ meta-llama/Llama-3.2-3B Fine-tuned with GRIT on nyu-mll/glue },
  author={te4bag},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/te4bag/GRIT-Full-GLUE-QNLI-llama-3.2-3B-Energy-0.9}
}

⚖️ License

This model inherits the Apache 2.0 license.

Downloads last month: 15

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for te4bag/GRIT-Full-GLUE-QNLI-llama-3.2-3B-Energy-0.9

Base model

meta-llama/Llama-3.2-3B

Adapter

(177)

this model

te4bag
/

GRIT-Full-GLUE-QNLI-llama-3.2-3B-Energy-0.9