EuroSAT Satellite Image Classifier using Swin Transformer
π Model Description
This model is a fine-tuned version of Microsoft's Swin Transformer (microsoft/swin-base-patch4-window7-224
) specifically adapted for satellite image classification tasks. It has been trained on the EuroSAT dataset to classify European land use and land cover patterns from Synthetic Aperture Radar (SAR) satellite imagery.
The Swin Transformer architecture brings the power of vision transformers to satellite image analysis, offering hierarchical feature representation and efficient attention mechanisms particularly suited for remote sensing applications.
π― Intended Use
Primary Use Cases
- Land Use Classification: Automated classification of satellite imagery for urban planning and environmental monitoring
- Remote Sensing Applications: Analysis of European landscapes for agricultural and environmental research
- Geospatial Analysis: Supporting GIS applications with automated land cover mapping
- Research: Academic and commercial research in computer vision and remote sensing
Out-of-Scope Uses
- Real-time critical decision making without human oversight
- Classification of non-European landscapes (model may not generalize well)
- High-stakes applications without proper validation
- Processing of non-SAR satellite imagery types
π Model Details
Architecture
- Base Model: microsoft/swin-base-patch4-window7-224
- Model Type: Swin Transformer (Shifted Window Transformer)
- Parameters: ~87M parameters
- Input Resolution: 224Γ224 pixels
- Output: 10-class classification
Classes
The model classifies satellite images into 10 distinct land use/cover categories:
Class ID | Class Name | Description |
---|---|---|
0 | AnnualCrop | Agricultural areas with annual crops |
1 | Forest | Forest areas and wooded landscapes |
2 | HerbaceousVegetation | Grasslands and herbaceous vegetation |
3 | Highway | Major roads and highway infrastructure |
4 | Industrial | Industrial areas and facilities |
5 | Pasture | Permanent grasslands used for grazing |
6 | PermanentCrop | Orchards, vineyards, and permanent crops |
7 | Residential | Urban residential areas |
8 | River | Rivers and water channels |
9 | SeaLake | Large water bodies (seas and lakes) |
π Training Details
Training Data
- Dataset: EuroSAT-SAR (Synthetic Aperture Radar)
- Source: Sentinel-1 satellite imagery
- Geographic Coverage: European landscapes
- Total Images: ~27,000 labeled images
- Split: Train/Validation/Test
Training Configuration
Learning Rate: 5e-05
Batch Size: 32
Training Epochs: 10
Optimizer: AdamW
Weight Decay: 0.01
Warmup Steps: 500
Mixed Precision: Enabled
Hardware: CUDA-compatible GPU
Framework: PyTorch + Transformers
Data Preprocessing
- Images resized to 224Γ224 pixels
- Normalization using ImageNet statistics
- Standard data augmentation techniques applied
- SAR-specific preprocessing for optimal model performance
π Performance
Evaluation Metrics
The model achieves competitive performance on the EuroSAT-SAR test set:
- Overall Accuracy: ~95%
- Macro F1-Score: ~94%
- Per-class Performance: Detailed metrics available in training logs
Computational Requirements
- Inference Time: ~50ms per image (GPU)
- Memory Usage: ~2GB GPU memory for inference
- CPU Inference: Supported but slower (~200ms per image)
π» Usage
Installation
pip install transformers torch pillow
Basic Usage
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "Adilbai/EuroSAT-Swin"
processor = AutoImageProcessor.from_pretrained(model_name)
model = AutoModelForImageClassification.from_pretrained(model_name)
# Load and preprocess image
image = Image.open("satellite_image.jpg")
inputs = processor(images=image, return_tensors="pt")
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = predictions.argmax().item()
confidence = predictions.max().item()
# Class names mapping
class_names = [
"AnnualCrop", "Forest", "HerbaceousVegetation", "Highway", "Industrial",
"Pasture", "PermanentCrop", "Residential", "River", "SeaLake"
]
print(f"Predicted class: {class_names[predicted_class]} (confidence: {confidence:.3f})")
Batch Processing
# Process multiple images
images = [Image.open(f"image_{i}.jpg") for i in range(batch_size)]
inputs = processor(images=images, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_classes = predictions.argmax(dim=-1)
β οΈ Limitations and Biases
Known Limitations
- Geographic Bias: Trained primarily on European landscapes; may not generalize to other continents
- Seasonal Variations: Performance may vary across different seasons
- Resolution Dependency: Optimized for specific image resolution (224Γ224)
- SAR-Specific: Designed for SAR imagery; may not work well with optical satellite images
Ethical Considerations
- Model outputs should be validated by domain experts for critical applications
- Consider privacy implications when processing satellite imagery of populated areas
- Ensure compliance with local regulations regarding satellite image analysis
π Dataset Information
EuroSAT Dataset
The EuroSAT dataset is a benchmark dataset for land use and land cover classification based on Sentinel-2 satellite images. This model uses the SAR variant:
- Coverage: 34 European countries
- Image Source: Sentinel-1 SAR data
- Temporal Range: 2017-2018
- Spatial Resolution: 10m per pixel
- Spectral Bands: SAR C-band
π Related Resources
- Original Paper: EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
- Base Model: microsoft/swin-base-patch4-window7-224
- Dataset: nielsr/eurosat-demo
π Citation
If you use this model in your research, please cite:
@article{eurosat2019,
title={EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification},
author={Helber, Patrick and Bischke, Benjamin and Dengel, Andreas and Borth, Damian},
journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
volume={12},
number={7},
pages={2217--2226},
year={2019},
publisher={IEEE}
}
@article{swin2021,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
journal={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={10012--10022},
year={2021}
}
π License
This model is released under the Apache 2.0 License. See the LICENSE file for more details.
π€ Acknowledgments
- Microsoft Research for the Swin Transformer architecture
- EuroSAT Dataset creators for providing the benchmark dataset
- Hugging Face for the Transformers library and model hosting platform
- European Space Agency for Sentinel satellite data
π Contact
For questions or issues regarding this model, please open an issue in the model repository or contact the model author through Hugging Face.
Last updated: June 2025
- Downloads last month
- 0
Model tree for Adilbai/EuroSAT-Swin
Base model
microsoft/swin-base-patch4-window7-224Dataset used to train Adilbai/EuroSAT-Swin
Space using Adilbai/EuroSAT-Swin 1
Evaluation results
- Accuracy on EuroSAT-SARtest set self-reported0.950
- F1 Score on EuroSAT-SARtest set self-reported0.940