Image-Text-to-Text
Transformers
Safetensors
lora
Inference Endpoints
Takara.ai Logo From the Frontier Research Team at **Takara.ai** we present a specialized LoRA adapter for aerial imagery analysis and visual question answering.

pixtral_aerial_VQA_adapter

Overview

This repository contains a fine-tuned LoRA adapter for the Pixtral-12B model, optimized specifically for aerial imagery analysis and visual question answering. The adapter enables detailed processing of aerial footage with a focus on construction site surveying, structural assessment, and environmental monitoring.

Model Details

  • Type: LoRA Adapter
  • Total Parameters: 6,225,920
  • Memory Usage: 23.75 MB
  • Precisions: torch.float32
  • Layer Types:
    • lora_A: 40
    • lora_B: 40
  • Base Model: mistralai/Pixtral-12B-2409

Capabilities

The adapter enhances Pixtral's ability to:

  • Identify and describe construction elements in aerial imagery
  • Detect structural issues in buildings and infrastructure
  • Analyze progress in construction projects
  • Monitor environmental changes and flooding events
  • Process high-resolution aerial imagery with improved detail recognition

Intended Use

  • Primary intended uses: Processing aerial footage of construction sites for structural and construction surveying.
  • Can also be applied to any detailed VQA use cases with aerial footage.
  • Suitable for disaster response and assessment applications, particularly flood monitoring.

Training Data

Training Procedure

  • Training method: LoRA (Low-Rank Adaptation)
  • Base model: Ertugrul/Pixtral-12B-Captioner-Relaxed
  • Training hardware: Nebius-hosted NVIDIA H100 machine

Usage Example

from transformers import AutoProcessor, AutoModelForCausalLM
import torch
from PIL import Image

# Load model and processor
model_id = "takara-ai/pixtral_aerial_VQA_adapter"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load and process image
image = Image.open("path_to_aerial_image.jpg")
prompt = "Describe the construction progress visible in this aerial image."

# Generate response
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
generated_ids = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7
)
response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Citation

@misc{rahnemoonfar2020floodnet,
 title={FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding},
 author={Maryam Rahnemoonfar and Tashnim Chowdhury and Argho Sarkar and Debvrat Varshney and Masoud Yari and Robin Murphy},
 year={2020},
 eprint={2012.02951},
 archivePrefix={arXiv},
 primaryClass={cs.CV},
 doi={10.48550/arXiv.2012.02951}
}

For research inquiries and press, please reach out to [email protected]

人類を変革する

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for takara-ai/pixtral_aerial_VQA_adapter

Adapter
(1)
this model

Datasets used to train takara-ai/pixtral_aerial_VQA_adapter