|
--- |
|
tags: |
|
- paligemma |
|
- lora |
|
- adapter |
|
- visual-question-answering |
|
- image-to-text |
|
base_model: google/paligemma2-3b-mix-224 |
|
widget: |
|
- text: "<image>\nQuestion: What is in this image?\nAnswer:" |
|
--- |
|
|
|
# paligemma2-3b-lora-vqa-d1000-r24 |
|
|
|
This is a LoRA adapter for PaliGemma-2 3B trained on VQA tasks. |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoProcessor, AutoModelForCausalLM |
|
from peft import PeftModel |
|
import torch |
|
|
|
# Base model |
|
base_model_id = "google/paligemma2-3b-mix-224" |
|
adapter_id = "yu3733/paligemma2-3b-lora-vqa-d1000-r24" |
|
|
|
# Load processor |
|
processor = AutoProcessor.from_pretrained(base_model_id) |
|
|
|
# Load base model |
|
model = AutoModelForCausalLM.from_pretrained( |
|
base_model_id, |
|
torch_dtype=torch.float16, |
|
device_map="auto" |
|
) |
|
|
|
# Load LoRA adapter |
|
model = PeftModel.from_pretrained(model, adapter_id) |
|
|
|
# Inference |
|
prompt = "<image>\nQuestion: What is in this image?\nAnswer:" |
|
inputs = processor(text=prompt, images=image, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_new_tokens=20) |
|
print(processor.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
## Training Details |
|
|
|
- Base Model: google/paligemma2-3b-mix-224 |
|
- Training Data: VizWiz VQA Dataset |
|
- LoRA Rank: 24 |
|
- Training Framework: PEFT + Transformers |
|
|
|
## License |
|
|
|
Same as the base model (see google/paligemma2-3b-mix-224) |
|
|