From the Frontier Research Team at **Takara.ai** we present a specialized LoRA adapter for aerial imagery analysis and visual question answering.

pixtral_aerial_VQA_adapter

Overview

This repository contains a fine-tuned LoRA adapter for the Pixtral-12B model, optimized specifically for aerial imagery analysis and visual question answering. The adapter enables detailed processing of aerial footage with a focus on construction site surveying, structural assessment, and environmental monitoring.

Model Details

Type: LoRA Adapter
Total Parameters: 6,225,920
Memory Usage: 23.75 MB
Precisions: torch.float32
Layer Types:
- lora_A: 40
- lora_B: 40
Base Model: mistralai/Pixtral-12B-2409

Capabilities

The adapter enhances Pixtral's ability to:

Identify and describe construction elements in aerial imagery
Detect structural issues in buildings and infrastructure
Analyze progress in construction projects
Monitor environmental changes and flooding events
Process high-resolution aerial imagery with improved detail recognition

Intended Use

Primary intended uses: Processing aerial footage of construction sites for structural and construction surveying.
Can also be applied to any detailed VQA use cases with aerial footage.
Suitable for disaster response and assessment applications, particularly flood monitoring.

Training Data

Dataset:
1. FloodNet Track 2 dataset
2. Subset of FGVC Aircraft dataset
3. Custom dataset of 10 image-caption pairs created using Pixtral

Training Procedure

Training method: LoRA (Low-Rank Adaptation)
Base model: Ertugrul/Pixtral-12B-Captioner-Relaxed
Training hardware: Nebius-hosted NVIDIA H100 machine

Usage Example

from transformers import AutoProcessor, AutoModelForCausalLM
import torch
from PIL import Image

# Load model and processor
model_id = "takara-ai/pixtral_aerial_VQA_adapter"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load and process image
image = Image.open("path_to_aerial_image.jpg")
prompt = "Describe the construction progress visible in this aerial image."

# Generate response
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
generated_ids = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7
)
response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Citation

@misc{rahnemoonfar2020floodnet,
 title={FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding},
 author={Maryam Rahnemoonfar and Tashnim Chowdhury and Argho Sarkar and Debvrat Varshney and Masoud Yari and Robin Murphy},
 year={2020},
 eprint={2012.02951},
 archivePrefix={arXiv},
 primaryClass={cs.CV},
 doi={10.48550/arXiv.2012.02951}
}

For research inquiries and press, please reach out to [email protected]

人類を変革する

takara-ai
/

pixtral_aerial_VQA_adapter