Model Card for LoRI-S_nlu_llama3_rank_64

This model is part of LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation.

Abstract: Low-Rank Adaptation (LoRA) has emerged as a popular parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs), yet it still incurs notable overhead and suffers from parameter interference in multi-task scenarios. We propose LoRA with Reduced Interference (LoRI), a simple yet effective approach that freezes the projection matrices $A$ as random projections and sparsifies the matrices $B$ using task-specific masks. This design substantially reduces the number of trainable parameters while maintaining strong task performance. Moreover, LoRI minimizes cross-task interference in adapter merging by leveraging the orthogonality between adapter subspaces, and supports continual learning by using sparsity to mitigate catastrophic forgetting. Extensive experiments across natural language understanding, mathematical reasoning, code generation, and safety alignment tasks demonstrate that LoRI outperforms full fine-tuning and existing PEFT methods, while using up to 95% fewer trainable parameters than LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. Code is available at: this https URL

Key Highlights:

  • Reduced Trainable Parameters: LoRI substantially reduces the number of trainable parameters (up to 95% fewer than standard LoRA) while maintaining strong task performance.
  • Minimized Cross-Task Interference: By leveraging the orthogonality between adapter subspaces, LoRI minimizes interference when merging adapters.
  • Continual Learning Support: LoRI uses sparsity to mitigate catastrophic forgetting, supporting effective continual learning.

Model Details

Model Description

LoRI-S_nlu_llama3_rank_64 is a specific adapter for meta-llama/Meta-Llama-3-8B fine-tuned for Natural Language Understanding (NLU) tasks using the LoRI (LoRA with Reduced Interference) method. LoRI is a parameter-efficient fine-tuning (PEFT) approach that freezes the LoRA projection matrices A as random projections and sparsifies the matrices B using task-specific masks. This design drastically reduces the number of trainable parameters while maintaining robust task performance. This model instance is trained with an adapter rank of 64.

  • Developed by: Juzheng Zhang, Jiacheng You, Ashwinee Panda, Tom Goldstein
  • Model type: Low-Rank Adaptation (LoRA) with Reduced Interference (LoRI) adapter
  • Language(s) (NLP): English
  • License: Apache 2.0
  • Finetuned from model: meta-llama/Meta-Llama-3-8B

Model Sources

Uses

Direct Use

LoRI is intended for parameter-efficient fine-tuning (PEFT) of Large Language Models (LLMs), particularly for single-task performance, multi-task scenarios (adapter merging), and continual learning. This specific adapter (LoRI-S_nlu_llama3_rank_64) is optimized for Natural Language Understanding (NLU) tasks.

Downstream Use

LoRI can be used to efficiently fine-tune LLMs for various tasks, including:

  • Natural Language Understanding (NLU)
  • Mathematical Reasoning
  • Code Generation
  • Safety Alignment

It is designed to outperform full fine-tuning and other PEFT methods while being highly parameter-efficient. Its reduced interference property makes it suitable for scenarios involving adapter merging and continual learning across different tasks.

Out-of-Scope Use

The model should not be used for any illegal or unethical purposes. Users should be aware that the base model's limitations and biases may still be present. As a language model adapter, it should not be used in safety-critical applications without thorough additional testing and validation.

Bias, Risks, and Limitations

The inherent biases, risks, and limitations of the base model (meta-llama/Meta-Llama-3-8B) apply to this adapter. Additionally, while LoRI aims to reduce cross-task interference, complete elimination of such interference may not be guaranteed across all possible task combinations. The paper focuses on specific benchmarks and tasks; performance on unaddressed tasks or distributions might vary.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Detailed evaluation on specific deployment scenarios and for diverse user groups is recommended to ensure responsible and fair usage.

How to Get Started with the Model

Pretrained LoRI adapters are available via the HuggingFace collection and can be loaded as follows:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load the base model and tokenizer
base_model_id = "meta-llama/Meta-Llama-3-8B"
# This model card is for tomg-group-umd/LoRI-S_nlu_llama3_rank_64
lori_adapter_id = "tomg-group-umd/LoRI-S_nlu_llama3_rank_64" 

# Load the base model with appropriate dtype and device mapping
# Adjust torch_dtype (e.g., torch.float16) as per your hardware/model requirements
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16, 
    low_cpu_mem_usage=True,
    device_map="auto" # Automatically distribute model across available GPUs
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

# Load the LoRI adapter and attach it to the base model
model = PeftModel.from_pretrained(base_model, lori_adapter_id)

# Optional: Merge the adapter weights into the base model for a single consolidated model
# This makes the model a standard Transformers model, removing the PEFT wrapper.
# model = model.merge_and_unload()

# Example inference
prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate text using the model with the loaded adapter
outputs = model.generate(
    **inputs,
    max_new_tokens=50, # Maximum number of new tokens to generate
    temperature=0.7,    # Sampling temperature
    do_sample=True,     # Enable sampling
    eos_token_id=tokenizer.eos_token_id, # Stop generation at end-of-sequence token
)

# Decode the generated tokens, skipping the input prompt
generated_text = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(f"Prompt: {prompt}
Generated: {generated_text}")

Training Details

Training Data

LoRI models are trained on various datasets across different tasks. For Natural Language Understanding (NLU) tasks, this model was trained on relevant NLU datasets. Other tasks supported by LoRI include:

  • Code generation: CodeAlpaca
  • Mathematical reasoning: GSM8K
  • Safety alignment: Saferpaca

Training Procedure

LoRI employs a two-stage training procedure as outlined in the paper and GitHub repository:

  1. LoRI-D (Dense) training: An initial phase where the projection matrices A are frozen as random projections, and matrices B are trained.
  2. LoRI-S (Sparse) training: Sparse masks are extracted from the trained LoRI-D models, and training continues with LoRI-S at a specified sparsity level (e.g., 90%).

The training is implemented using Fully Sharded Data Parallel (FSDP) and is designed for execution in a multi-GPU environment.

Training Hyperparameters

  • Adapter ranks: 32 and 64 (this model is rank 64).
  • Sparsity (LoRI-S): 90%.
  • Specific training scripts and hyperparameters for various tasks are available in the LoRI GitHub repository.

Evaluation

Testing Data, Factors & Metrics

Evaluation was conducted across a wide range of tasks, including natural language understanding, mathematical reasoning, code generation, and safety alignment. For code generation performance, HumanEval was used as a benchmark, evaluated with the bigcode-evaluation-harness.

Results

Extensive experiments demonstrate that LoRI outperforms full fine-tuning and existing PEFT methods, while using up to 95% fewer trainable parameters than standard LoRA. In multi-task experiments, LoRI enables effective adapter merging and continual learning with reduced cross-task interference. For detailed quantitative results and specific metrics, please refer to the original paper.

Technical Specifications

Model Architecture and Objective

LoRI modifies the standard LoRA architecture by freezing the projection matrices A as random projections and by sparsifying the matrices B using task-specific masks. This design aims to substantially reduce trainable parameters and minimize cross-task interference during adapter merging and continual learning, while maintaining strong task performance.

Compute Infrastructure

Hardware

Training and evaluation are designed for multi-GPU environments, leveraging techniques like Fully Sharded Data Parallel (FSDP).

Software

The implementation relies on PyTorch and the PEFT library, along with other dependencies specified in the project's requirements.txt.

  • PEFT version: 0.12.0

Citation

If you use LoRI in your work, please cite:

@article{zhang2025lori,
  title={LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation},
  author={Zhang, Juzheng and You, Jiacheng and Panda, Ashwinee and Goldstein, Tom},
  journal={arXiv preprint arXiv:2504.07448},
  year={2025}
}

More Information

Feel free to reach out to the authors listed in the paper or refer to the project's GitHub repository if you have any questions.

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tomg-group-umd/LoRI-S_nlu_llama3_rank_64

Adapter
(643)
this model

Collection including tomg-group-umd/LoRI-S_nlu_llama3_rank_64