Model Card for OceanSAR-1

Model Details

Model Description

OceanSAR-1 is the first foundation model in the OceanSAR family, specifically designed for Synthetic Aperture Radar (SAR) imagery analysis, with a focus on ocean observation. The model is trained using a novel dynamic dataset pruning strategy that enhances training efficiency and feature quality.

Developed by: Thomas Kerdreux, Alexandre Tuel @ Galeio
Deployed by: Antoine Audras @ Galeio
Model type: Vision Foundation Model (ResNet50/ViT variants)
License: Apache License 2.0
Training data: Sentinel-1 Wave Mode (WV) SAR images (2015-2024)
Training regime: DINO self-supervised learning with dynamic dataset pruning

Uses

Direct Use

The model is intended to be used as a feature extractor for SAR image analysis, particularly for ocean observation tasks. It can be used for:

Feature extraction from SAR images
Transfer learning for downstream tasks

Downstream Use

The model has been validated on three downstream tasks:

TenGeoP Classification: Classification of 10 geophysical phenomena in SAR images
Significant Wave Height Estimation: Regression task for ocean wave height prediction
Wind Speed Prediction: Regression task for surface wind speed estimation

How to Use

import torch
from transformers import AutoModel

# Load model and processor
model = AutoModel.from_pretrained("galeio-research/OceanSAR-1")

# Prepare your SAR image (should be single-channel VV polarization)
# Here using random data as example
dummy_image = torch.randn(1, 1, 256, 256)  # (C, H, W)

# Extract features
with torch.no_grad():
    outputs = model(dummy_image)
    features = outputs.pooler_output  # Shape: (1, 2048) for ResNet50

Training Details

Training Data

Dataset: Sentinel-1 Wave Mode (WV) SAR images
Time period: 2015-2024
Size: ~12 million images
Preprocessing:
- Spatial downsampling to 50m resolution
- Dynamic dataset pruning for diversity and balancedness
- Excluded validation images from training set

Dynamic Dataset Pruning

The model uses a novel dynamic dataset pruning strategy that:

Maximizes dataset diversity and balancedness
Reduces computational costs
Improves model performance on downstream tasks
Works without requiring a pre-existing feature extractor

Evaluation

Results

The model achieves state-of-the-art performance on three downstream tasks (linear probing):

TenGeoP Classification:
- ResNet50: 75.5% accuracy
- ViT-S/16: 78.6% accuracy
- ViT-S/8: 82.1% accuracy
- ViT-B/8: 83.6% accuracy
Significant Wave Height Estimation:
- RMSE: 0.63-0.72m (depending on architecture)
Wind Speed Prediction:
- RMSE: 1.37-1.43 m/s (depending on architecture)

For commercial deployments or to access optimized model variants for specific operational needs, feel free to reach out to discuss licensing and support options.

Technical Specifications

Hardware Requirements

GPU with at least 8GB VRAM recommended

Dependencies

PyTorch >= 1.8.0
Transformers >= 4.30.0
torchvision >= 0.9.0

Input Specifications

Input size: 256x256 pixels
Single channel (VV polarization)
Normalized pixel values
SAR images from Sentinel-1 Wave Mode

Citation

BibTeX:

@article{kerdreux2025efficientselfsupervisedlearningearth,
  title={Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation},
  author={Kerdreux, Thomas and Tuel, Alexandre and Febvre, Quentin and Mouche, Alexis and Chapron, Bertrand},
  journal={arXiv preprint arXiv:2504.06962},
  year={2025},
  eprint={2504.06962},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2504.06962},
}

Acknowledgements

This work was granted access to the HPC resources of IDRIS and TGCC under the allocation 2025-[A0171015666] made by GENCI.

galeio-research
/

OceanSAR-1