Gemma-2B LoRA Adapter - Fine-tuned on Alpaca Dataset

This repository contains a LoRA (Low-Rank Adaptation) adapter for google/gemma-2b-it fine-tuned on the Alpaca dataset using QLoRA (Quantized Low-Rank Adaptation).

πŸ“‹ Model Details

  • Base Model: google/gemma-2b-it
  • Adapter Type: LoRA (Low-Rank Adaptation)
  • Fine-tuning Method: QLoRA (4-bit quantization)
  • Dataset: tatsu-lab/alpaca (1000 samples used)
  • Training Epochs: 1
  • LoRA Rank: 16
  • LoRA Alpha: 32
  • LoRA Dropout: 0.1
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, embed_tokens, lm_head

πŸš€ Usage

Loading the Adapter with PEFT

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# Base model name
base_model_name = "google/gemma-2b-it"
adapter_model_name = "arumpuri/gemma-2b-alpaca-qlora-lora-adapter"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Configure quantization (optional, for memory efficiency)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=False,
)

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,  # Remove this line if you don't want quantization
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_model_name)

def generate_response(prompt, max_new_tokens=256):
    formatted_prompt = f"<|startofturn|>user\n{prompt}<|endofturn|>\n<|startofturn|>model\n"
    
    inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("<|startofturn|>model\n")[-1].split("<|endofturn|>")[0].strip()

# Example usage
prompt = "Explain machine learning in simple terms"
response = generate_response(prompt)
print(response)

Alternative: Using AutoPeftModelForCausalLM (Simpler)

import torch
from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM

# Load model and tokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
    "arumpuri/gemma-2b-alpaca-qlora-lora-adapter",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("arumpuri/gemma-2b-alpaca-qlora-lora-adapter")

# Use the same generate_response function as above

πŸ”§ Training Configuration

  • Learning Rate: 0.0002
  • Per Device Train Batch Size: 1
  • Gradient Accumulation Steps: 8
  • Effective Batch Size: 8
  • Max Sequence Length: 512
  • Optimizer: paged_adamw_8bit
  • Warmup Ratio: 0.03
  • Max Training Steps: 100
  • Weight Decay: 0.001
  • FP16: True

πŸ’Ύ Adapter Size

This LoRA adapter is significantly smaller than a full model:

  • Adapter size: ~50-100MB (vs ~5GB for full model)
  • Memory efficient: Can be loaded on top of quantized base model
  • Fast download: Quick to download and load

🎯 Performance

This adapter was trained on Google Colab Free Tier with the following optimizations:

  • 4-bit quantization of base model
  • LoRA rank of 16 for memory efficiency
  • Gradient checkpointing enabled
  • Optimized batch size and sequence length

⚠️ Limitations

  • This model inherits the limitations of the base Gemma model
  • May generate biased or inappropriate content
  • Trained on a subset of Alpaca dataset (1000 samples)
  • Use responsibly and implement appropriate safety measures in production

🀝 How to Merge (Optional)

If you need the full merged model and have sufficient memory:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model (without quantization for merging)
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2b-it",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load adapter
model = PeftModel.from_pretrained(base_model, "arumpuri/gemma-2b-alpaca-qlora-lora-adapter")

# Merge and save
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged_model")

# Save tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
tokenizer.save_pretrained("./merged_model")

πŸ”— Related

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for arumpuri/gemma-2b-alpaca-qlora-lora-adapter

Base model

google/gemma-2b-it
Adapter
(628)
this model

Dataset used to train arumpuri/gemma-2b-alpaca-qlora-lora-adapter