satdino-vit_base-16 / README.md
strakajk's picture
Update README.md
22b7a25 verified
---
license: apache-2.0
---
# SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing
These are official weights for "SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing" — a self-supervised learning framework tailored for satellite imagery. SatDINO builds upon the **[DINO](https://github.com/facebookresearch/dino)** framework and adapts it to the unique remote sensing data.
[ **[Paper](https://arxiv.org/abs/2508.21402v1)** ], [ **[GitHub](https://github.com/strakaj/SatDINO)** ]
## Pretrained models
The models are pretrained on the RGB variant of the fMoW dataset and evaluated across multiple standard remote sensing benchmarks.
| arch | patch size | params. | GFLOPs | linear | hugging face | weights | weights-finetune |
|-----------|------------|---------|--------|--------|---------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
| ViT-S | 16 | 21.59 | 8.54 | 72.75 | [strakajk/satdino-vit_small-16](https://huggingface.co/strakajk/satdino-vit_small-16) | [ckp](https://huggingface.co/strakajk/satdino-vit_small-16/resolve/main/satdino-vit_small-16.pth) | [ckp](https://huggingface.co/strakajk/satdino-vit_small-16/resolve/main/satdino-vit_small-16-finetune.pth) |
| ViT-S | 8 | 21.37 | 33.56 | 73.53 | [strakajk/satdino-vit_small-8](https://huggingface.co/strakajk/satdino-vit_small-8) | [ckp](https://huggingface.co/strakajk/satdino-vit_small-8/resolve/main/satdino-vit_small-8.pth) | [ckp](https://huggingface.co/strakajk/satdino-vit_small-8/resolve/main/satdino-vit_small-8-finetune.pth) |
| ViT-B | 16 | 85.65 | 33.90 | 73.52 | [strakajk/satdino-vit_base-16](https://huggingface.co/strakajk/satdino-vit_base-16) | [ckp](https://huggingface.co/strakajk/satdino-vit_base-16/resolve/main/satdino-vit_base-16.pth) | [ckp](https://huggingface.co/strakajk/satdino-vit_base-16/resolve/main/satdino-vit_base-16-finetune.pth) |
### Create from HF
You can create a model using Hugging Face or from the official **[GitHub](https://github.com/strakaj/SatDINO)** repository.
```python
import torch
from transformers import AutoModel
model = AutoModel.from_pretrained("strakajk/satdino-vit_base-16", trust_remote_code=True)
model.eval()
# predict
x = torch.randn(1, 3, 224, 224)
y = model(x) # out: torch.Size([1, 768])
```
## Results
| Dataset | **SatDINO<sub>8</sub>** | **SatDINO<sub>16</sub>** | **Scale-MAE** | **SatMAE** |
|-----------|-----------------|--------------------|---------------|------------|
| EuroSAT | **87.72** | 85.96 | 85.42 | 81.43 |
| RESISC45 | **85.29** | 82.32 | 79.96 | 65.96 |
| UC Merced | **94.82** | 93.21 | 84.58 | 78.45 |
| WHU-RS19 | **98.18** | 97.82 | 89.32 | 86.41 |
| RS-C11 | **96.91** | 96.61 | 93.03 | 83.96 |
| SIRI-WHU | **91.82** | 87.19 | 84.84 | 77.76 |
Average kNN classification accuracy across multiple scales (12.5%, 25%, 50%, and 100%).
---
| **Dataset** | **Small<sub>16</sub>** | **Small<sub>8</sub>** | **Base** |
|-------------|------------------|---------------|---------------|
| EuroSAT | 98.69 | 98.76 | **98.83** |
| RESISC45 | 95.68 | 95.16 | **96.05** |
| UC Merced | 98.33 | **98.81** | 98.57 |
| WHU-RS19 | **98.54** | 98.06 | 97.57 |
| RS-C11 | **98.01** | 96.81 | 96.02 |
| SIRI-WHU | **98.54** | 97.08 | 97.08 |
SatDINO fine-tuning classification accuracy.
---
| **Model** | **Backbone** | **Potsdam 224<sup>2</sup>** | **Potsdam 512<sup>2</sup>** | **Vaihingen 224<sup>2</sup>** | **Vaihingen 512<sup>2</sup>** | **LoveDA 224<sup>2</sup>** | **LoveDA 512<sup>2</sup>** |
|-----------|------------------|---------------------|---------------------|-----------------------|-----------------------|--------------------|--------------------|
| SatMAE | ViT-Large | 67.88 | 70.39 | 64,81 | 69.13 | 46.28 | 52.28 |
| Scale-MAE | ViT-Large | 69.74 | **72.21** | 67.97 | **71.65** | **49.37** | **53.70** |
| SatDINO | ViT-Small<sub>16</sub> | 67.93 | 71.80 | 63.38 | 68.32 | 44.77 | 49.65 |
| SatDINO | ViT-Small<sub>8</sub> | **70.71** | 71.45 | **68.69** | 67.71 | 47.53 | 50.20 |
| SatDINO | ViT-Base | 67.65 | 71.63 | 64.85 | 69.37 | 44.25 | 50.08 |
Semantic segmentation performance across multiple datasets and image scales. All results are reported in terms of mean Intersection over Union (mIoU).
## License
This repository is released under the Apache 2.0 license as found in the LICENSE file.
## Citation
If you find this repository useful, please consider citing it:
```
@misc{straka2025satdinodeepdiveselfsupervised,
title={SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing},
author={Jakub Straka and Ivan Gruber},
year={2025},
eprint={2508.21402},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.21402},
}
```