Update README.md

22b7a25 verified about 1 month ago

6.11 kB

	---
	license: apache-2.0
	---

	# SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing


	These are official weights for "SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing" — a self-supervised learning framework tailored for satellite imagery. SatDINO builds upon the [DINO](https://github.com/facebookresearch/dino) framework and adapts it to the unique remote sensing data.

	[ [Paper](https://arxiv.org/abs/2508.21402v1) ], [ [GitHub](https://github.com/strakaj/SatDINO) ]


	## Pretrained models

	The models are pretrained on the RGB variant of the fMoW dataset and evaluated across multiple standard remote sensing benchmarks.

	\| arch \| patch size \| params. \| GFLOPs \| linear \| hugging face \| weights \| weights-finetune \|
	\|-----------\|------------\|---------\|--------\|--------\|---------------------------------------------------------------------------------------\|---------------------------------------------------------------------------------------------------\|------------------------------------------------------------------------------------------------------------\|
	\| ViT-S \| 16 \| 21.59 \| 8.54 \| 72.75 \| [strakajk/satdino-vit_small-16](https://huggingface.co/strakajk/satdino-vit_small-16) \| [ckp](https://huggingface.co/strakajk/satdino-vit_small-16/resolve/main/satdino-vit_small-16.pth) \| [ckp](https://huggingface.co/strakajk/satdino-vit_small-16/resolve/main/satdino-vit_small-16-finetune.pth) \|
	\| ViT-S \| 8 \| 21.37 \| 33.56 \| 73.53 \| [strakajk/satdino-vit_small-8](https://huggingface.co/strakajk/satdino-vit_small-8) \| [ckp](https://huggingface.co/strakajk/satdino-vit_small-8/resolve/main/satdino-vit_small-8.pth) \| [ckp](https://huggingface.co/strakajk/satdino-vit_small-8/resolve/main/satdino-vit_small-8-finetune.pth) \|
	\| ViT-B \| 16 \| 85.65 \| 33.90 \| 73.52 \| [strakajk/satdino-vit_base-16](https://huggingface.co/strakajk/satdino-vit_base-16) \| [ckp](https://huggingface.co/strakajk/satdino-vit_base-16/resolve/main/satdino-vit_base-16.pth) \| [ckp](https://huggingface.co/strakajk/satdino-vit_base-16/resolve/main/satdino-vit_base-16-finetune.pth) \|


	### Create from HF
	You can create a model using Hugging Face or from the official [GitHub](https://github.com/strakaj/SatDINO) repository.

	```python
	import torch
	from transformers import AutoModel

	model = AutoModel.from_pretrained("strakajk/satdino-vit_base-16", trust_remote_code=True)
	model.eval()

	# predict
	x = torch.randn(1, 3, 224, 224)
	y = model(x) # out: torch.Size([1, 768])
	```


	## Results
	\| Dataset \| SatDINO<sub>8</sub> \| SatDINO<sub>16</sub> \| Scale-MAE \| SatMAE \|
	\|-----------\|-----------------\|--------------------\|---------------\|------------\|
	\| EuroSAT \| 87.72 \| 85.96 \| 85.42 \| 81.43 \|
	\| RESISC45 \| 85.29 \| 82.32 \| 79.96 \| 65.96 \|
	\| UC Merced \| 94.82 \| 93.21 \| 84.58 \| 78.45 \|
	\| WHU-RS19 \| 98.18 \| 97.82 \| 89.32 \| 86.41 \|
	\| RS-C11 \| 96.91 \| 96.61 \| 93.03 \| 83.96 \|
	\| SIRI-WHU \| 91.82 \| 87.19 \| 84.84 \| 77.76 \|

	Average kNN classification accuracy across multiple scales (12.5%, 25%, 50%, and 100%).

	---

	\| Dataset \| Small<sub>16</sub> \| Small<sub>8</sub> \| Base \|
	\|-------------\|------------------\|---------------\|---------------\|
	\| EuroSAT \| 98.69 \| 98.76 \| 98.83 \|
	\| RESISC45 \| 95.68 \| 95.16 \| 96.05 \|
	\| UC Merced \| 98.33 \| 98.81 \| 98.57 \|
	\| WHU-RS19 \| 98.54 \| 98.06 \| 97.57 \|
	\| RS-C11 \| 98.01 \| 96.81 \| 96.02 \|
	\| SIRI-WHU \| 98.54 \| 97.08 \| 97.08 \|

	SatDINO fine-tuning classification accuracy.

	---

	\| Model \| Backbone \| Potsdam 224<sup>2</sup> \| Potsdam 512<sup>2</sup> \| Vaihingen 224<sup>2</sup> \| Vaihingen 512<sup>2</sup> \| LoveDA 224<sup>2</sup> \| LoveDA 512<sup>2</sup> \|
	\|-----------\|------------------\|---------------------\|---------------------\|-----------------------\|-----------------------\|--------------------\|--------------------\|
	\| SatMAE \| ViT-Large \| 67.88 \| 70.39 \| 64,81 \| 69.13 \| 46.28 \| 52.28 \|
	\| Scale-MAE \| ViT-Large \| 69.74 \| 72.21 \| 67.97 \| 71.65 \| 49.37 \| 53.70 \|
	\| SatDINO \| ViT-Small<sub>16</sub> \| 67.93 \| 71.80 \| 63.38 \| 68.32 \| 44.77 \| 49.65 \|
	\| SatDINO \| ViT-Small<sub>8</sub> \| 70.71 \| 71.45 \| 68.69 \| 67.71 \| 47.53 \| 50.20 \|
	\| SatDINO \| ViT-Base \| 67.65 \| 71.63 \| 64.85 \| 69.37 \| 44.25 \| 50.08 \|

	Semantic segmentation performance across multiple datasets and image scales. All results are reported in terms of mean Intersection over Union (mIoU).


	## License
	This repository is released under the Apache 2.0 license as found in the LICENSE file.


	## Citation
	If you find this repository useful, please consider citing it:
	```
	@misc{straka2025satdinodeepdiveselfsupervised,
	title={SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing},
	author={Jakub Straka and Ivan Gruber},
	year={2025},
	eprint={2508.21402},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2508.21402},
	}
	```