PLEASE NOTE, THIS MODEL CARD IS FOR THE WHOLE REPOSITORY. IF YOU WISH TO SEE THE INDIVIDUAL MODEL CARDS, PLEASE ENTER THE MODEL FOLDERS (MULTIMODAL, UNIMODAL-IMAGE, ETC.) AND READ THE README'S THERE
Model Card for sams-tom/multimodal-auv-bathy-bnn-classifier
This repository hosts a collection of Bayesian Neural Network (BNN) models developed for the classification of Autonomous Underwater Vehicle (AUV) sensor data. It includes unimodal classifiers for image, bathymetry, and side-scan sonar (SSS) data, as well as a powerful multimodal classifier that fuses information from all three sensor types. These models provide not only class predictions but also crucial uncertainty estimates, enhancing their utility in complex and uncertain marine environments.
Model Details
Model Description
This Hugging Face repository (sams-tom/multimodal-auv-bathy-bnn-classifier) serves as a central hub for various BNN-based classifiers for marine habitat and object recognition using AUV sensor data. The collection includes:
Unimodal Image Classifier: A BNN built on a ResNet50Custom architecture, specialized for optical images from AUVs.
Unimodal Bathymetry Classifier: A BNN built on a ResNet50Custom architecture, specialized for bathymetric data from AUVs.
Unimodal Side-Scan Sonar (SSS) Classifier: A BNN built on a ResNet50Custom architecture, specialized for side-scan sonar data from AUVs.
Multimodal Classifier: A sophisticated BNN (MultiModalModel) that integrates features from image, bathymetry, and SSS data using attention mechanisms for enhanced classification performance.
All models leverage the Bayesian Neural Network paradigm to quantify prediction uncertainty, providing a more robust and informative output compared to traditional deterministic models. The underlying backbone for feature extraction in both unimodal and multimodal models is ResNet50, adapted for specific input channel requirements and converted to a BNN architecture.
Model metadata
Developed by: Sams-Tom
Shared by: Sams-Tom
Model type: Collection of Bayesian Neural Network (BNN) Classifiers (Computer Vision, Sensor Fusion)
Language(s) (NLP): N/A (These are computer vision/sensor fusion models)
License: [MIT]
Finetuned from model [optional]: ResNet50 (pre-trained on ImageNet-1K for feature extraction)
Model Sources [optional]
Repository: https://huggingface.co/sams-tom/multimodal-auv-bathy-bnn-classifier
Paper : [In development]
Uses
Direct Use
The models in this repository are intended for direct use in various marine applications requiring classification of AUV sensor data. This includes:
Automated marine habitat mapping.
Identification of seabed features and objects.
Environmental monitoring and assessment.
Scientific research involving underwater surveys.
The uncertainty quantification provided by the BNNs can aid in:
Highlighting regions where predictions are less confident, indicating a need for further investigation or human review.
Informing adaptive sampling strategies for AUVs, directing them to areas of high uncertainty.
Downstream Use
These models can serve as critical components within larger autonomous marine systems for:
Real-time decision-making and mission planning for AUVs.
Integration into advanced navigation and localization systems.
Supporting long-term environmental monitoring programs through automated data analysis pipelines.
Recommendations
Rigorous Validation : Users should perform extensive validation on their specific target environments and tasks to ensure the models meet their performance and reliability requirements.
Contextual Use of Uncertainty : Integrate the provided uncertainty estimates into a broader decision-making framework, using them to inform, rather than solely dictate, actions.
Data Augmentation & Diversity : Future work and users training similar models should prioritize diverse data collection and robust augmentation strategies to enhance generalization.
Community Engagement : Engage with the repository's discussions for support, reporting issues, or contributing improvements.
How to Get Started with the Models
This repository contains multiple models, each residing in its own subfolder. To get started, you will need torch, huggingface_hub, and bayesian-torch installed: pip install torch huggingface_hub bayesian-torch
You can load any of the models using the Hugging Face AutoModel.from_pretrained() method by specifying the subfolder argument. It's crucial to use trust_remote_code=True because the models rely on custom Python classes defined in model_definitions.py at the repository root.
To load a specific model:
Python
from transformers import AutoModel
import torch
import json
import os
from huggingface_hub import hf_hub_download
repo_id = "sams-tom/multimodal-auv-bathy-bnn-classifier"
# Example: Loading the Multimodal BNN
multimodal_model_subfolder = "multimodal-bnn"
# Load BNN prior parameters for consistency (they are stored with the model)
multimodal_bnn_params_path = hf_hub_download(repo_id=repo_id, filename=os.path.join(multimodal_model_subfolder, "bnn_params.json"))
with open(multimodal_bnn_params_path, "r") as f:
multimodal_const_bnn_prior_parameters = json.load(f)
print(f"Multimodal BNN Prior Params: {multimodal_const_bnn_prior_parameters}")
multimodal_model = AutoModel.from_pretrained(repo_id, subfolder=multimodal_model_subfolder, trust_remote_code=True)
multimodal_model.eval()
# Example: Loading the Unimodal Image BNN
image_model_subfolder = "unimodal-image-bnn"
image_bnn_params_path = hf_hub_download(repo_id=repo_id, filename=os.path.join(image_model_subfolder, "bnn_params.json"))
with open(image_bnn_params_path, "r") as f:
image_const_bnn_prior_parameters = json.load(f)
print(f"Image BNN Prior Params: {image_const_bnn_prior_parameters}")
image_model = AutoModel.from_pretrained(repo_id, subfolder=image_model_subfolder, trust_remote_code=True)
image_model.eval()
# (Repeat for 'unimodal-bathy-bnn' and 'unimodal-sss-bnn' following the same pattern)
# For inference, refer to the individual model cards for specific input requirements.
Training Details
Training Data
All models were trained on a custom, internal dataset, identified as "2506 bayes results new labelling," comprising various modalities of AUV sensor data (image, bathymetry, side-scan sonar). Modalities : Image (3 channels), Bathymetry (3 channels), Side-Scan Sonar (1 channel).
Training Procedure
The training process involved:
Preprocessing of sensor data to appropriate dimensions and formats for ResNet50 backbones (e.g., 224x224 pixels).
Use of ResNet50 (ImageNet pre-trained) as feature extractors for unimodal models and as components within the multimodal model.
Conversion of deterministic models to Bayesian Neural Networks using the bayesian-torch library, employing a reparameterization type for the BNN layers with specific prior parameters: {"prior_mu": 0.0, "prior_sigma": 1.0, "posterior_mu_init": 0.0, "posterior_rho_init": -3.0, "type": "Reparameterization", "moped_enable": True, "moped_delta": 0.1}.
Training Hyperparameters
Training regime: fp32.
Optimizer: [Adam]
Learning Rate: [Varies for model]
Batch Size: [12]
Number of Epochs: [30]
Loss Function: [CrossEntropyLoss]
Hardware Used for Training: [3x NVIDIA RTX A6000]
Model Architecture and Objective
The repository contains ResNet50Custom models (used unimodally and as feature extractors) and a MultiModalModel. The objective for all models is multi-class classification (7 classes) of AUV sensor data. The MultiModalModel specifically employs additive attention to fuse features from the distinct modalities. All models are converted to BNNs to enable uncertainty quantification.
Compute Infrastructure Hardware: [3x NVIDIA RTX A6000]
Software: Python, PyTorch, Hugging Face transformers and huggingface_hub, bayesian-torch library.
Citation [tbc]
Model Card Authors [optional]
Name: Tom Morgan Organisation: Scottish Association for Marine Science (UHI) Email: [email protected] Linkedln: https://www.linkedin.com/in/tom-morgan-8a73b129b/
Model tree for sams-tom/multimodal-auv-bathy-bnn-classifier
Base model
microsoft/resnet-50