EuroSAT Satellite Image Classifier using Swin Transformer

πŸ“‹ Model Description

This model is a fine-tuned version of Microsoft's Swin Transformer (microsoft/swin-base-patch4-window7-224) specifically adapted for satellite image classification tasks. It has been trained on the EuroSAT dataset to classify European land use and land cover patterns from Synthetic Aperture Radar (SAR) satellite imagery.

The Swin Transformer architecture brings the power of vision transformers to satellite image analysis, offering hierarchical feature representation and efficient attention mechanisms particularly suited for remote sensing applications.

🎯 Intended Use

Primary Use Cases

  • Land Use Classification: Automated classification of satellite imagery for urban planning and environmental monitoring
  • Remote Sensing Applications: Analysis of European landscapes for agricultural and environmental research
  • Geospatial Analysis: Supporting GIS applications with automated land cover mapping
  • Research: Academic and commercial research in computer vision and remote sensing

Out-of-Scope Uses

  • Real-time critical decision making without human oversight
  • Classification of non-European landscapes (model may not generalize well)
  • High-stakes applications without proper validation
  • Processing of non-SAR satellite imagery types

πŸ“Š Model Details

Architecture

  • Base Model: microsoft/swin-base-patch4-window7-224
  • Model Type: Swin Transformer (Shifted Window Transformer)
  • Parameters: ~87M parameters
  • Input Resolution: 224Γ—224 pixels
  • Output: 10-class classification

Classes

The model classifies satellite images into 10 distinct land use/cover categories:

Class ID Class Name Description
0 AnnualCrop Agricultural areas with annual crops
1 Forest Forest areas and wooded landscapes
2 HerbaceousVegetation Grasslands and herbaceous vegetation
3 Highway Major roads and highway infrastructure
4 Industrial Industrial areas and facilities
5 Pasture Permanent grasslands used for grazing
6 PermanentCrop Orchards, vineyards, and permanent crops
7 Residential Urban residential areas
8 River Rivers and water channels
9 SeaLake Large water bodies (seas and lakes)

πŸš€ Training Details

Training Data

  • Dataset: EuroSAT-SAR (Synthetic Aperture Radar)
  • Source: Sentinel-1 satellite imagery
  • Geographic Coverage: European landscapes
  • Total Images: ~27,000 labeled images
  • Split: Train/Validation/Test

Training Configuration

Learning Rate: 5e-05
Batch Size: 32
Training Epochs: 10
Optimizer: AdamW
Weight Decay: 0.01
Warmup Steps: 500
Mixed Precision: Enabled
Hardware: CUDA-compatible GPU
Framework: PyTorch + Transformers

Data Preprocessing

  • Images resized to 224Γ—224 pixels
  • Normalization using ImageNet statistics
  • Standard data augmentation techniques applied
  • SAR-specific preprocessing for optimal model performance

πŸ“ˆ Performance

Evaluation Metrics

The model achieves competitive performance on the EuroSAT-SAR test set:

  • Overall Accuracy: ~95%
  • Macro F1-Score: ~94%
  • Per-class Performance: Detailed metrics available in training logs

Computational Requirements

  • Inference Time: ~50ms per image (GPU)
  • Memory Usage: ~2GB GPU memory for inference
  • CPU Inference: Supported but slower (~200ms per image)

πŸ’» Usage

Installation

pip install transformers torch pillow

Basic Usage

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

# Load model and processor
model_name = "Adilbai/EuroSAT-Swin"
processor = AutoImageProcessor.from_pretrained(model_name)
model = AutoModelForImageClassification.from_pretrained(model_name)

# Load and preprocess image
image = Image.open("satellite_image.jpg")
inputs = processor(images=image, return_tensors="pt")

# Make prediction
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = predictions.argmax().item()
    confidence = predictions.max().item()

# Class names mapping
class_names = [
    "AnnualCrop", "Forest", "HerbaceousVegetation", "Highway", "Industrial",
    "Pasture", "PermanentCrop", "Residential", "River", "SeaLake"
]

print(f"Predicted class: {class_names[predicted_class]} (confidence: {confidence:.3f})")

Batch Processing

# Process multiple images
images = [Image.open(f"image_{i}.jpg") for i in range(batch_size)]
inputs = processor(images=images, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_classes = predictions.argmax(dim=-1)

⚠️ Limitations and Biases

Known Limitations

  • Geographic Bias: Trained primarily on European landscapes; may not generalize to other continents
  • Seasonal Variations: Performance may vary across different seasons
  • Resolution Dependency: Optimized for specific image resolution (224Γ—224)
  • SAR-Specific: Designed for SAR imagery; may not work well with optical satellite images

Ethical Considerations

  • Model outputs should be validated by domain experts for critical applications
  • Consider privacy implications when processing satellite imagery of populated areas
  • Ensure compliance with local regulations regarding satellite image analysis

πŸ“š Dataset Information

EuroSAT Dataset

The EuroSAT dataset is a benchmark dataset for land use and land cover classification based on Sentinel-2 satellite images. This model uses the SAR variant:

  • Coverage: 34 European countries
  • Image Source: Sentinel-1 SAR data
  • Temporal Range: 2017-2018
  • Spatial Resolution: 10m per pixel
  • Spectral Bands: SAR C-band

πŸ”— Related Resources

πŸ“„ Citation

If you use this model in your research, please cite:

@article{eurosat2019,
    title={EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification},
    author={Helber, Patrick and Bischke, Benjamin and Dengel, Andreas and Borth, Damian},
    journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
    volume={12},
    number={7},
    pages={2217--2226},
    year={2019},
    publisher={IEEE}
}

@article{swin2021,
    title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
    author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
    journal={Proceedings of the IEEE/CVF International Conference on Computer Vision},
    pages={10012--10022},
    year={2021}
}

πŸ“œ License

This model is released under the Apache 2.0 License. See the LICENSE file for more details.

🀝 Acknowledgments

  • Microsoft Research for the Swin Transformer architecture
  • EuroSAT Dataset creators for providing the benchmark dataset
  • Hugging Face for the Transformers library and model hosting platform
  • European Space Agency for Sentinel satellite data

πŸ“ž Contact

For questions or issues regarding this model, please open an issue in the model repository or contact the model author through Hugging Face.


Last updated: June 2025

Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Adilbai/EuroSAT-Swin

Finetuned
(66)
this model

Dataset used to train Adilbai/EuroSAT-Swin

Space using Adilbai/EuroSAT-Swin 1

Evaluation results