pixtral_aerial_VQA_adapter
Overview
This repository contains a fine-tuned LoRA adapter for the Pixtral-12B model, optimized specifically for aerial imagery analysis and visual question answering. The adapter enables detailed processing of aerial footage with a focus on construction site surveying, structural assessment, and environmental monitoring.
Model Details
- Type: LoRA Adapter
- Total Parameters: 6,225,920
- Memory Usage: 23.75 MB
- Precisions: torch.float32
- Layer Types:
- lora_A: 40
- lora_B: 40
- Base Model: mistralai/Pixtral-12B-2409
Capabilities
The adapter enhances Pixtral's ability to:
- Identify and describe construction elements in aerial imagery
- Detect structural issues in buildings and infrastructure
- Analyze progress in construction projects
- Monitor environmental changes and flooding events
- Process high-resolution aerial imagery with improved detail recognition
Intended Use
- Primary intended uses: Processing aerial footage of construction sites for structural and construction surveying.
- Can also be applied to any detailed VQA use cases with aerial footage.
- Suitable for disaster response and assessment applications, particularly flood monitoring.
Training Data
- Dataset:
- FloodNet Track 2 dataset
- Subset of FGVC Aircraft dataset
- Custom dataset of 10 image-caption pairs created using Pixtral
Training Procedure
- Training method: LoRA (Low-Rank Adaptation)
- Base model: Ertugrul/Pixtral-12B-Captioner-Relaxed
- Training hardware: Nebius-hosted NVIDIA H100 machine
Usage Example
from transformers import AutoProcessor, AutoModelForCausalLM
import torch
from PIL import Image
# Load model and processor
model_id = "takara-ai/pixtral_aerial_VQA_adapter"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
# Load and process image
image = Image.open("path_to_aerial_image.jpg")
prompt = "Describe the construction progress visible in this aerial image."
# Generate response
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
generated_ids = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.7
)
response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Citation
@misc{rahnemoonfar2020floodnet,
title={FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding},
author={Maryam Rahnemoonfar and Tashnim Chowdhury and Argho Sarkar and Debvrat Varshney and Masoud Yari and Robin Murphy},
year={2020},
eprint={2012.02951},
archivePrefix={arXiv},
primaryClass={cs.CV},
doi={10.48550/arXiv.2012.02951}
}
For research inquiries and press, please reach out to [email protected]
人類を変革する
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for takara-ai/pixtral_aerial_VQA_adapter
Base model
mistralai/Pixtral-12B-Base-2409
Finetuned
mistralai/Pixtral-12B-2409