Gemma-2B LoRA Adapter - Fine-tuned on Alpaca Dataset

This repository contains a LoRA (Low-Rank Adaptation) adapter for google/gemma-2b-it fine-tuned on the Alpaca dataset using QLoRA (Quantized Low-Rank Adaptation).

📋 Model Details

Base Model: google/gemma-2b-it
Adapter Type: LoRA (Low-Rank Adaptation)
Fine-tuning Method: QLoRA (4-bit quantization)
Dataset: tatsu-lab/alpaca (1000 samples used)
Training Epochs: 1
LoRA Rank: 16
LoRA Alpha: 32
LoRA Dropout: 0.1
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, embed_tokens, lm_head

🚀 Usage

Loading the Adapter with PEFT

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# Base model name
base_model_name = "google/gemma-2b-it"
adapter_model_name = "arumpuri/gemma-2b-alpaca-qlora-lora-adapter"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Configure quantization (optional, for memory efficiency)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=False,
)

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,  # Remove this line if you don't want quantization
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_model_name)

def generate_response(prompt, max_new_tokens=256):
    formatted_prompt = f"<|startofturn|>user\n{prompt}<|endofturn|>\n<|startofturn|>model\n"
    
    inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("<|startofturn|>model\n")[-1].split("<|endofturn|>")[0].strip()

# Example usage
prompt = "Explain machine learning in simple terms"
response = generate_response(prompt)
print(response)

Alternative: Using AutoPeftModelForCausalLM (Simpler)

import torch
from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM

# Load model and tokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
    "arumpuri/gemma-2b-alpaca-qlora-lora-adapter",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("arumpuri/gemma-2b-alpaca-qlora-lora-adapter")

# Use the same generate_response function as above

🔧 Training Configuration

Learning Rate: 0.0002
Per Device Train Batch Size: 1
Gradient Accumulation Steps: 8
Effective Batch Size: 8
Max Sequence Length: 512
Optimizer: paged_adamw_8bit
Warmup Ratio: 0.03
Max Training Steps: 100
Weight Decay: 0.001
FP16: True

💾 Adapter Size

This LoRA adapter is significantly smaller than a full model:

Adapter size: ~50-100MB (vs ~5GB for full model)
Memory efficient: Can be loaded on top of quantized base model
Fast download: Quick to download and load

🎯 Performance

This adapter was trained on Google Colab Free Tier with the following optimizations:

4-bit quantization of base model
LoRA rank of 16 for memory efficiency
Gradient checkpointing enabled
Optimized batch size and sequence length

⚠️ Limitations

This model inherits the limitations of the base Gemma model
May generate biased or inappropriate content
Trained on a subset of Alpaca dataset (1000 samples)
Use responsibly and implement appropriate safety measures in production

🤝 How to Merge (Optional)

If you need the full merged model and have sufficient memory:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model (without quantization for merging)
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2b-it",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load adapter
model = PeftModel.from_pretrained(base_model, "arumpuri/gemma-2b-alpaca-qlora-lora-adapter")

# Merge and save
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged_model")

# Save tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
tokenizer.save_pretrained("./merged_model")

🔗 Related

Base Model: google/gemma-2b-it
Dataset: tatsu-lab/alpaca
Training Framework: PEFT

Downloads last month: 1

Model tree for arumpuri/gemma-2b-alpaca-qlora-lora-adapter

Base model

google/gemma-2b-it

Adapter

(628)

this model

arumpuri
/

gemma-2b-alpaca-qlora-lora-adapter