---
license: apache-2.0
datasets:
- jonathan-roberts1/RSI-CB256
language:
- en
base_model:
- google/siglip2-base-patch16-224
pipeline_tag: image-classification
library_name: transformers
tags:
- Location
- RSI
- Remote Sensing Instruments
---

![3.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/QJkZkY5d6EwB4P5xOrgfY.png)

# **RSI-CB256-35**

> **RSI-CB256-35** is a vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for **multi-class remote sensing image classification**. Built using the **SiglipForImageClassification** architecture, it is designed to accurately categorize overhead imagery into 35 distinct land-use and land-cover categories.

```py
Classification Report:
                         precision    recall  f1-score   support

            parking lot     0.9978    0.9872    0.9925       467
                 avenue     0.9927    1.0000    0.9963       544
                highway     0.9283    0.9865    0.9565       223
                 bridge     0.9283    0.9659    0.9467       469
                 marina     0.9946    1.0000    0.9973       366
             crossroads     0.9909    0.9801    0.9855       553
         airport runway     0.9956    0.9926    0.9941       678
               pipeline     0.9900    1.0000    0.9950       198
                   town     0.9970    1.0000    0.9985       335
               airplane     0.9915    0.9915    0.9915       351
                 forest     0.9972    0.9945    0.9958      1082
               mangrove     1.0000    1.0000    1.0000      1049
   artificial grassland     0.9821    0.9717    0.9769       283
river protection forest     1.0000    1.0000    1.0000       524
              shrubwood     1.0000    1.0000    1.0000      1331
                sapling     0.9955    1.0000    0.9977       879
          sparse forest     1.0000    1.0000    1.0000      1110
              lakeshore     1.0000    1.0000    1.0000       438
                  river     0.9680    0.9555    0.9617       539
                 stream     1.0000    0.9971    0.9985       688
              coastline     0.9913    0.9978    0.9946       459
                  hirst     0.9890    1.0000    0.9945       628
                    dam     0.9868    0.9259    0.9554       324
                    sea     0.9971    0.9864    0.9917      1028
          snow mountain     1.0000    1.0000    1.0000      1153
              sandbeach     0.9944    0.9907    0.9925       536
               mountain     0.9926    0.9938    0.9932       812
                 desert     0.9757    0.9927    0.9841      1092
               dry farm     1.0000    0.9992    0.9996      1309
         green farmland     0.9984    0.9969    0.9977       644
              bare land     0.9870    0.9630    0.9748       864
          city building     0.9785    0.9892    0.9838      1014
              residents     0.9926    0.9877    0.9901       810
              container     0.9970    0.9955    0.9962       660
           storage room     0.9985    1.0000    0.9992      1307

               accuracy                         0.9919     24747
              macro avg     0.9894    0.9897    0.9895     24747
           weighted avg     0.9920    0.9919    0.9919     24747
```

---

## **Label Space: 35 Remote Sensing Classes**

This model supports the classification of satellite or aerial images into the following classes:

```
Class 0:  "parking lot"  
Class 1:  "avenue"  
Class 2:  "highway"  
Class 3:  "bridge"  
Class 4:  "marina"  
Class 5:  "crossroads"  
Class 6:  "airport runway"  
Class 7:  "pipeline"  
Class 8:  "town"  
Class 9:  "airplane"  
Class 10: "forest"  
Class 11: "mangrove"  
Class 12: "artificial grassland"  
Class 13: "river protection forest"  
Class 14: "shrubwood"  
Class 15: "sapling"  
Class 16: "sparse forest"  
Class 17: "lakeshore"  
Class 18: "river"  
Class 19: "stream"  
Class 20: "coastline"  
Class 21: "hirst"  
Class 22: "dam"  
Class 23: "sea"  
Class 24: "snow mountain"  
Class 25: "sandbeach"  
Class 26: "mountain"  
Class 27: "desert"  
Class 28: "dry farm"  
Class 29: "green farmland"  
Class 30: "bare land"  
Class 31: "city building"  
Class 32: "residents"  
Class 33: "container"  
Class 34: "storage room"
```

---

## **Install Dependencies**

```bash
pip install -q transformers torch pillow gradio
```

---

## **Inference Code**

```python
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch

# Load model and processor
model_name = "prithivMLmods/RSI-CB256-35"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

# ID to label mapping
id2label = {
    "0": "parking lot",
    "1": "avenue",
    "2": "highway",
    "3": "bridge",
    "4": "marina",
    "5": "crossroads",
    "6": "airport runway",
    "7": "pipeline",
    "8": "town",
    "9": "airplane",
    "10": "forest",
    "11": "mangrove",
    "12": "artificial grassland",
    "13": "river protection forest",
    "14": "shrubwood",
    "15": "sapling",
    "16": "sparse forest",
    "17": "lakeshore",
    "18": "river",
    "19": "stream",
    "20": "coastline",
    "21": "hirst",
    "22": "dam",
    "23": "sea",
    "24": "snow mountain",
    "25": "sandbeach",
    "26": "mountain",
    "27": "desert",
    "28": "dry farm",
    "29": "green farmland",
    "30": "bare land",
    "31": "city building",
    "32": "residents",
    "33": "container",
    "34": "storage room"
}

def classify_rsi_image(image):
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

    prediction = {
        id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
    }

    return prediction

# Gradio Interface
iface = gr.Interface(
    fn=classify_rsi_image,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(num_top_classes=5, label="Top-5 Predicted Categories"),
    title="RSI-CB256-35",
    description="Remote sensing image classification using SigLIP2. Upload an aerial or satellite image to classify its land-use category."
)

if __name__ == "__main__":
    iface.launch()
```

---

## **Intended Use**

* **Land-Use Mapping and Planning**
* **Environmental Monitoring**
* **Infrastructure Identification**
* **Remote Sensing Analytics**
* **Agricultural and Forest Area Classification**