--- tags: - paligemma - lora - adapter - visual-question-answering - image-to-text base_model: google/paligemma2-3b-mix-224 widget: - text: "\nQuestion: What is in this image?\nAnswer:" --- # paligemma2-3b-lora-vqa-d1000-r24 This is a LoRA adapter for PaliGemma-2 3B trained on VQA tasks. ## Usage ```python from transformers import AutoProcessor, AutoModelForCausalLM from peft import PeftModel import torch # Base model base_model_id = "google/paligemma2-3b-mix-224" adapter_id = "yu3733/paligemma2-3b-lora-vqa-d1000-r24" # Load processor processor = AutoProcessor.from_pretrained(base_model_id) # Load base model model = AutoModelForCausalLM.from_pretrained( base_model_id, torch_dtype=torch.float16, device_map="auto" ) # Load LoRA adapter model = PeftModel.from_pretrained(model, adapter_id) # Inference prompt = "\nQuestion: What is in this image?\nAnswer:" inputs = processor(text=prompt, images=image, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=20) print(processor.decode(outputs[0], skip_special_tokens=True)) ``` ## Training Details - Base Model: google/paligemma2-3b-mix-224 - Training Data: VizWiz VQA Dataset - LoRA Rank: 24 - Training Framework: PEFT + Transformers ## License Same as the base model (see google/paligemma2-3b-mix-224)