Gemma-2B LoRA Adapter - Fine-tuned on Alpaca Dataset
This repository contains a LoRA (Low-Rank Adaptation) adapter for google/gemma-2b-it fine-tuned on the Alpaca dataset using QLoRA (Quantized Low-Rank Adaptation).
π Model Details
- Base Model: google/gemma-2b-it
- Adapter Type: LoRA (Low-Rank Adaptation)
- Fine-tuning Method: QLoRA (4-bit quantization)
- Dataset: tatsu-lab/alpaca (1000 samples used)
- Training Epochs: 1
- LoRA Rank: 16
- LoRA Alpha: 32
- LoRA Dropout: 0.1
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, embed_tokens, lm_head
π Usage
Loading the Adapter with PEFT
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
# Base model name
base_model_name = "google/gemma-2b-it"
adapter_model_name = "arumpuri/gemma-2b-alpaca-qlora-lora-adapter"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
# Configure quantization (optional, for memory efficiency)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=False,
)
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
quantization_config=bnb_config, # Remove this line if you don't want quantization
device_map="auto",
torch_dtype=torch.float16,
trust_remote_code=True
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_model_name)
def generate_response(prompt, max_new_tokens=256):
formatted_prompt = f"<|startofturn|>user\n{prompt}<|endofturn|>\n<|startofturn|>model\n"
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=0.7,
top_p=0.9,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response.split("<|startofturn|>model\n")[-1].split("<|endofturn|>")[0].strip()
# Example usage
prompt = "Explain machine learning in simple terms"
response = generate_response(prompt)
print(response)
Alternative: Using AutoPeftModelForCausalLM (Simpler)
import torch
from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM
# Load model and tokenizer
model = AutoPeftModelForCausalLM.from_pretrained(
"arumpuri/gemma-2b-alpaca-qlora-lora-adapter",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("arumpuri/gemma-2b-alpaca-qlora-lora-adapter")
# Use the same generate_response function as above
π§ Training Configuration
- Learning Rate: 0.0002
- Per Device Train Batch Size: 1
- Gradient Accumulation Steps: 8
- Effective Batch Size: 8
- Max Sequence Length: 512
- Optimizer: paged_adamw_8bit
- Warmup Ratio: 0.03
- Max Training Steps: 100
- Weight Decay: 0.001
- FP16: True
πΎ Adapter Size
This LoRA adapter is significantly smaller than a full model:
- Adapter size: ~50-100MB (vs ~5GB for full model)
- Memory efficient: Can be loaded on top of quantized base model
- Fast download: Quick to download and load
π― Performance
This adapter was trained on Google Colab Free Tier with the following optimizations:
- 4-bit quantization of base model
- LoRA rank of 16 for memory efficiency
- Gradient checkpointing enabled
- Optimized batch size and sequence length
β οΈ Limitations
- This model inherits the limitations of the base Gemma model
- May generate biased or inappropriate content
- Trained on a subset of Alpaca dataset (1000 samples)
- Use responsibly and implement appropriate safety measures in production
π€ How to Merge (Optional)
If you need the full merged model and have sufficient memory:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load base model (without quantization for merging)
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-2b-it",
torch_dtype=torch.float16,
device_map="auto"
)
# Load adapter
model = PeftModel.from_pretrained(base_model, "arumpuri/gemma-2b-alpaca-qlora-lora-adapter")
# Merge and save
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged_model")
# Save tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
tokenizer.save_pretrained("./merged_model")
π Related
- Base Model: google/gemma-2b-it
- Dataset: tatsu-lab/alpaca
- Training Framework: PEFT
- Downloads last month
- 1
Model tree for arumpuri/gemma-2b-alpaca-qlora-lora-adapter
Base model
google/gemma-2b-it