SigLIP2 Content Filters 052025 Patch 1
Collection
Moderation, Balance, Classifiers
•
7 items
•
Updated
RSI-CB256-35 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class remote sensing image classification. Built using the SiglipForImageClassification architecture, it is designed to accurately categorize overhead imagery into 35 distinct land-use and land-cover categories.
Classification Report:
precision recall f1-score support
parking lot 0.9978 0.9872 0.9925 467
avenue 0.9927 1.0000 0.9963 544
highway 0.9283 0.9865 0.9565 223
bridge 0.9283 0.9659 0.9467 469
marina 0.9946 1.0000 0.9973 366
crossroads 0.9909 0.9801 0.9855 553
airport runway 0.9956 0.9926 0.9941 678
pipeline 0.9900 1.0000 0.9950 198
town 0.9970 1.0000 0.9985 335
airplane 0.9915 0.9915 0.9915 351
forest 0.9972 0.9945 0.9958 1082
mangrove 1.0000 1.0000 1.0000 1049
artificial grassland 0.9821 0.9717 0.9769 283
river protection forest 1.0000 1.0000 1.0000 524
shrubwood 1.0000 1.0000 1.0000 1331
sapling 0.9955 1.0000 0.9977 879
sparse forest 1.0000 1.0000 1.0000 1110
lakeshore 1.0000 1.0000 1.0000 438
river 0.9680 0.9555 0.9617 539
stream 1.0000 0.9971 0.9985 688
coastline 0.9913 0.9978 0.9946 459
hirst 0.9890 1.0000 0.9945 628
dam 0.9868 0.9259 0.9554 324
sea 0.9971 0.9864 0.9917 1028
snow mountain 1.0000 1.0000 1.0000 1153
sandbeach 0.9944 0.9907 0.9925 536
mountain 0.9926 0.9938 0.9932 812
desert 0.9757 0.9927 0.9841 1092
dry farm 1.0000 0.9992 0.9996 1309
green farmland 0.9984 0.9969 0.9977 644
bare land 0.9870 0.9630 0.9748 864
city building 0.9785 0.9892 0.9838 1014
residents 0.9926 0.9877 0.9901 810
container 0.9970 0.9955 0.9962 660
storage room 0.9985 1.0000 0.9992 1307
accuracy 0.9919 24747
macro avg 0.9894 0.9897 0.9895 24747
weighted avg 0.9920 0.9919 0.9919 24747
This model supports the classification of satellite or aerial images into the following classes:
Class 0: "parking lot"
Class 1: "avenue"
Class 2: "highway"
Class 3: "bridge"
Class 4: "marina"
Class 5: "crossroads"
Class 6: "airport runway"
Class 7: "pipeline"
Class 8: "town"
Class 9: "airplane"
Class 10: "forest"
Class 11: "mangrove"
Class 12: "artificial grassland"
Class 13: "river protection forest"
Class 14: "shrubwood"
Class 15: "sapling"
Class 16: "sparse forest"
Class 17: "lakeshore"
Class 18: "river"
Class 19: "stream"
Class 20: "coastline"
Class 21: "hirst"
Class 22: "dam"
Class 23: "sea"
Class 24: "snow mountain"
Class 25: "sandbeach"
Class 26: "mountain"
Class 27: "desert"
Class 28: "dry farm"
Class 29: "green farmland"
Class 30: "bare land"
Class 31: "city building"
Class 32: "residents"
Class 33: "container"
Class 34: "storage room"
pip install -q transformers torch pillow gradio
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/RSI-CB256-35"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# ID to label mapping
id2label = {
"0": "parking lot",
"1": "avenue",
"2": "highway",
"3": "bridge",
"4": "marina",
"5": "crossroads",
"6": "airport runway",
"7": "pipeline",
"8": "town",
"9": "airplane",
"10": "forest",
"11": "mangrove",
"12": "artificial grassland",
"13": "river protection forest",
"14": "shrubwood",
"15": "sapling",
"16": "sparse forest",
"17": "lakeshore",
"18": "river",
"19": "stream",
"20": "coastline",
"21": "hirst",
"22": "dam",
"23": "sea",
"24": "snow mountain",
"25": "sandbeach",
"26": "mountain",
"27": "desert",
"28": "dry farm",
"29": "green farmland",
"30": "bare land",
"31": "city building",
"32": "residents",
"33": "container",
"34": "storage room"
}
def classify_rsi_image(image):
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
prediction = {
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
}
return prediction
# Gradio Interface
iface = gr.Interface(
fn=classify_rsi_image,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(num_top_classes=5, label="Top-5 Predicted Categories"),
title="RSI-CB256-35",
description="Remote sensing image classification using SigLIP2. Upload an aerial or satellite image to classify its land-use category."
)
if __name__ == "__main__":
iface.launch()
Base model
google/siglip2-base-patch16-224