yu3733
/

paligemma2-3b-lora-vqa-d1000-r24

visual-question-answering

Model card Files Files and versions

paligemma2-3b-lora-vqa-d1000-r24 / README.md

yu3733's picture

Upload README.md with huggingface_hub

6a7e08b verified 4 months ago

|

history blame contribute delete

1.32 kB

	---
	tags:
	- paligemma
	- lora
	- adapter
	- visual-question-answering
	- image-to-text
	base_model: google/paligemma2-3b-mix-224
	widget:
	- text: "<image>\nQuestion: What is in this image?\nAnswer:"
	---

	# paligemma2-3b-lora-vqa-d1000-r24

	This is a LoRA adapter for PaliGemma-2 3B trained on VQA tasks.

	## Usage

	```python
	from transformers import AutoProcessor, AutoModelForCausalLM
	from peft import PeftModel
	import torch

	# Base model
	base_model_id = "google/paligemma2-3b-mix-224"
	adapter_id = "yu3733/paligemma2-3b-lora-vqa-d1000-r24"

	# Load processor
	processor = AutoProcessor.from_pretrained(base_model_id)

	# Load base model
	model = AutoModelForCausalLM.from_pretrained(
	base_model_id,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	# Load LoRA adapter
	model = PeftModel.from_pretrained(model, adapter_id)

	# Inference
	prompt = "<image>\nQuestion: What is in this image?\nAnswer:"
	inputs = processor(text=prompt, images=image, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=20)
	print(processor.decode(outputs[0], skip_special_tokens=True))
	```

	## Training Details

	- Base Model: google/paligemma2-3b-mix-224
	- Training Data: VizWiz VQA Dataset
	- LoRA Rank: 24
	- Training Framework: PEFT + Transformers

	## License

	Same as the base model (see google/paligemma2-3b-mix-224)