Introduction

This repository contains eight WideResNet-101-2 models trained by the Dal (Dalhousie University) team for the FathomNet 2025 competition, predictions from these models achieved 3rd place. These models are trained using distinct seeds and are intended to be used in the form of an ensemble. Each of these models contains in its corresponding folder: the model checkpoint file (containing weights), the predictions on the competition test dataset, and recorded training information. The overall process includes an iterative self-training pipeline, of which these models are the 21st iteration.

Intended Use

The purpose of these models is to classify underwater imagery spanning the 79 leaf nodes in the FathomNet 2025. Each independent model in this ensemble possesses 100 classification heads, which are all capable of making predictions on the data. Confidence is then calculated based on the predicted probability distribution across these 100 heads, in an effort to capture epistemic uncertainty. The ensemble prediction set is then generated by taking the mode of predictions across these eight component models, with ties broken by average confidence.

Further details on these models will be provided along with our GitHub code link when our report is finalized.

Factors

Two main strategies appeared to be effective in our experimentation. We used a hierarchical distance-weighted modified version of cross-entropy loss and combined this with a self-training process, by which future training iterations learned on confident pseudo-labels for the test data, driven by earlier generations of models.

Metrics

While we employed accuracy internally, the evaluated metric is hierarchical distance (based on hops from the ground truth annotation in a hierarchical tree. We implemented and used both in our experimentation. Ensemble iteration 21 attained a public distance (competition public leaderboard) score of 2.27, and a private distance (competition evaluation leaderboard) score of 1.83.

Training and Evaluation Data

We employed both mentioned metrics in tuning hyperparameters, using a randomly split validation dataset taken from the training subset of data (typically about 20% of training data). Once we determined these optimal hyperparameters, we employed the full training dataset to make test predictions for submission. Over time, with the self-training process, increasingly confident pseudo-labelled test samples would be incrementally added to the training dataset, for further generations of models. Self-training performed in this fashion does not require ground truth for test annotations, and may be used for any downstream dataset of interest.

Deployment

The models in question may be loaded and examined through PyTorch, and were implemented in a fairly standard approach using the library. The recommended resize setting is 112 px, as this was what we used to train these models. The normalization values are recommended to be the standard ImageNet settings, as these models employ Torchvision's pre-training on ImageNet.

Additional information and code will be released in the near future along with updates to this model card.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for FathomNet/Dal-FathomNet-ST-Ensemble_itr-21

Base model

timm/wide_resnet101_2.tv_in1k

Finetuned

(1)

this model