|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing |
|
|
|
|
|
These are official weights for "SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing" — a self-supervised learning framework tailored for satellite imagery. SatDINO builds upon the **[DINO](https://github.com/facebookresearch/dino)** framework and adapts it to the unique remote sensing data. |
|
|
|
[ **[Paper](https://arxiv.org/abs/2508.21402v1)** ], [ **[GitHub](https://github.com/strakaj/SatDINO)** ] |
|
|
|
|
|
## Pretrained models |
|
|
|
The models are pretrained on the RGB variant of the fMoW dataset and evaluated across multiple standard remote sensing benchmarks. |
|
|
|
| arch | patch size | params. | GFLOPs | linear | hugging face | weights | weights-finetune | |
|
|-----------|------------|---------|--------|--------|---------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------| |
|
| ViT-S | 16 | 21.59 | 8.54 | 72.75 | [strakajk/satdino-vit_small-16](https://huggingface.co/strakajk/satdino-vit_small-16) | [ckp](https://huggingface.co/strakajk/satdino-vit_small-16/resolve/main/satdino-vit_small-16.pth) | [ckp](https://huggingface.co/strakajk/satdino-vit_small-16/resolve/main/satdino-vit_small-16-finetune.pth) | |
|
| ViT-S | 8 | 21.37 | 33.56 | 73.53 | [strakajk/satdino-vit_small-8](https://huggingface.co/strakajk/satdino-vit_small-8) | [ckp](https://huggingface.co/strakajk/satdino-vit_small-8/resolve/main/satdino-vit_small-8.pth) | [ckp](https://huggingface.co/strakajk/satdino-vit_small-8/resolve/main/satdino-vit_small-8-finetune.pth) | |
|
| ViT-B | 16 | 85.65 | 33.90 | 73.52 | [strakajk/satdino-vit_base-16](https://huggingface.co/strakajk/satdino-vit_base-16) | [ckp](https://huggingface.co/strakajk/satdino-vit_base-16/resolve/main/satdino-vit_base-16.pth) | [ckp](https://huggingface.co/strakajk/satdino-vit_base-16/resolve/main/satdino-vit_base-16-finetune.pth) | |
|
|
|
|
|
### Create from HF |
|
You can create a model using Hugging Face or from the official **[GitHub](https://github.com/strakaj/SatDINO)** repository. |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModel |
|
|
|
model = AutoModel.from_pretrained("strakajk/satdino-vit_base-16", trust_remote_code=True) |
|
model.eval() |
|
|
|
# predict |
|
x = torch.randn(1, 3, 224, 224) |
|
y = model(x) # out: torch.Size([1, 768]) |
|
``` |
|
|
|
|
|
## Results |
|
| Dataset | **SatDINO<sub>8</sub>** | **SatDINO<sub>16</sub>** | **Scale-MAE** | **SatMAE** | |
|
|-----------|-----------------|--------------------|---------------|------------| |
|
| EuroSAT | **87.72** | 85.96 | 85.42 | 81.43 | |
|
| RESISC45 | **85.29** | 82.32 | 79.96 | 65.96 | |
|
| UC Merced | **94.82** | 93.21 | 84.58 | 78.45 | |
|
| WHU-RS19 | **98.18** | 97.82 | 89.32 | 86.41 | |
|
| RS-C11 | **96.91** | 96.61 | 93.03 | 83.96 | |
|
| SIRI-WHU | **91.82** | 87.19 | 84.84 | 77.76 | |
|
|
|
Average kNN classification accuracy across multiple scales (12.5%, 25%, 50%, and 100%). |
|
|
|
--- |
|
|
|
| **Dataset** | **Small<sub>16</sub>** | **Small<sub>8</sub>** | **Base** | |
|
|-------------|------------------|---------------|---------------| |
|
| EuroSAT | 98.69 | 98.76 | **98.83** | |
|
| RESISC45 | 95.68 | 95.16 | **96.05** | |
|
| UC Merced | 98.33 | **98.81** | 98.57 | |
|
| WHU-RS19 | **98.54** | 98.06 | 97.57 | |
|
| RS-C11 | **98.01** | 96.81 | 96.02 | |
|
| SIRI-WHU | **98.54** | 97.08 | 97.08 | |
|
|
|
SatDINO fine-tuning classification accuracy. |
|
|
|
--- |
|
|
|
| **Model** | **Backbone** | **Potsdam 224<sup>2</sup>** | **Potsdam 512<sup>2</sup>** | **Vaihingen 224<sup>2</sup>** | **Vaihingen 512<sup>2</sup>** | **LoveDA 224<sup>2</sup>** | **LoveDA 512<sup>2</sup>** | |
|
|-----------|------------------|---------------------|---------------------|-----------------------|-----------------------|--------------------|--------------------| |
|
| SatMAE | ViT-Large | 67.88 | 70.39 | 64,81 | 69.13 | 46.28 | 52.28 | |
|
| Scale-MAE | ViT-Large | 69.74 | **72.21** | 67.97 | **71.65** | **49.37** | **53.70** | |
|
| SatDINO | ViT-Small<sub>16</sub> | 67.93 | 71.80 | 63.38 | 68.32 | 44.77 | 49.65 | |
|
| SatDINO | ViT-Small<sub>8</sub> | **70.71** | 71.45 | **68.69** | 67.71 | 47.53 | 50.20 | |
|
| SatDINO | ViT-Base | 67.65 | 71.63 | 64.85 | 69.37 | 44.25 | 50.08 | |
|
|
|
Semantic segmentation performance across multiple datasets and image scales. All results are reported in terms of mean Intersection over Union (mIoU). |
|
|
|
|
|
## License |
|
This repository is released under the Apache 2.0 license as found in the LICENSE file. |
|
|
|
|
|
## Citation |
|
If you find this repository useful, please consider citing it: |
|
``` |
|
@misc{straka2025satdinodeepdiveselfsupervised, |
|
title={SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing}, |
|
author={Jakub Straka and Ivan Gruber}, |
|
year={2025}, |
|
eprint={2508.21402}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV}, |
|
url={https://arxiv.org/abs/2508.21402}, |
|
} |
|
``` |