SigLIP2 070225, 070125
Collection
new exps
โข
3 items
โข
Updated
imagenet-50-subset is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to classify images into a subset of 50 categories derived from the ImageNet dataset using the SiglipForImageClassification architecture.
Classification Report:
precision recall f1-score support
tench 0.9878 0.9911 0.9895 900
goldfish 0.9945 0.9956 0.9950 900
great white shark 0.9339 0.8944 0.9137 900
tiger shark 0.8957 0.8967 0.8962 900
hammerhead 0.9300 0.9589 0.9442 900
electric ray 0.8788 0.8622 0.8704 900
stingray 0.8689 0.8911 0.8799 900
cock 0.9000 0.9200 0.9099 900
hen 0.9162 0.8867 0.9012 900
ostrich 0.9945 0.9989 0.9967 900
brambling 0.9671 0.9478 0.9574 900
goldfinch 0.9867 0.9911 0.9889 900
house finch 0.9629 0.9811 0.9719 900
junco 0.9583 0.9700 0.9641 900
indigo bunting 0.9933 0.9911 0.9922 900
robin 0.9888 0.9811 0.9849 900
bulbul 0.9735 0.9811 0.9773 900
jay 0.9855 0.9789 0.9822 900
magpie 0.9776 0.9700 0.9738 900
chickadee 0.9834 0.9844 0.9839 900
water ouzel 0.9680 0.9744 0.9712 900
kite 0.9512 0.9522 0.9517 900
bald eagle 0.9843 0.9722 0.9782 900
vulture 0.9562 0.9700 0.9630 900
great grey owl 0.9989 0.9944 0.9967 900
european fire salamander 0.9330 0.9278 0.9304 900
common newt 0.7969 0.7933 0.7951 900
eft 0.9162 0.8989 0.9075 900
spotted salamander 0.9249 0.9300 0.9274 900
axolotl 0.9888 0.9767 0.9827 900
bullfrog 0.9116 0.9167 0.9141 900
tree frog 0.9108 0.9533 0.9316 900
tailed frog 0.8658 0.8100 0.8370 900
loggerhead 0.8657 0.8956 0.8804 900
leatherback turtle 0.9038 0.8667 0.8849 900
mud turtle 0.7980 0.7111 0.7521 900
terrapin 0.7039 0.7844 0.7420 900
box turtle 0.8576 0.8633 0.8605 900
banded gecko 0.9255 0.9111 0.9183 900
common iguana 0.9033 0.9133 0.9083 900
american chameleon 0.6577 0.7622 0.7061 900
whiptail 0.8351 0.8722 0.8533 900
agama 0.9010 0.8900 0.8955 900
frilled lizard 0.9674 0.9233 0.9449 900
alligator lizard 0.8862 0.8822 0.8842 900
gila monster 0.9821 0.9733 0.9777 900
green lizard 0.6574 0.5756 0.6137 900
african chameleon 0.9573 0.9711 0.9641 900
komodo dragon 0.9693 0.9811 0.9752 900
african crocodile 0.9769 0.9878 0.9823 900
accuracy 0.9181 45000
macro avg 0.9186 0.9181 0.9181 45000
weighted avg 0.9186 0.9181 0.9181 45000
The model classifies each image into one of the following categories:
0: tench
1: goldfish
2: great white shark
3: tiger shark
4: hammerhead
5: electric ray
6: stingray
7: cock
8: hen
9: ostrich
10: brambling
11: goldfinch
12: house finch
13: junco
14: indigo bunting
15: robin
16: bulbul
17: jay
18: magpie
19: chickadee
20: water ouzel
21: kite
22: bald eagle
23: vulture
24: great grey owl
25: european fire salamander
26: common newt
27: eft
28: spotted salamander
29: axolotl
30: bullfrog
31: tree frog
32: tailed frog
33: loggerhead
34: leatherback turtle
35: mud turtle
36: terrapin
37: box turtle
38: banded gecko
39: common iguana
40: american chameleon
41: whiptail
42: agama
43: frilled lizard
44: alligator lizard
45: gila monster
46: green lizard
47: african chameleon
48: komodo dragon
49: african crocodile
pip install -q transformers torch pillow gradio
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/imagenet-50-subset" # Replace if different
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# Label mapping
id2label = {
"0": "tench",
"1": "goldfish",
"2": "great white shark",
"3": "tiger shark",
"4": "hammerhead",
"5": "electric ray",
"6": "stingray",
"7": "cock",
"8": "hen",
"9": "ostrich",
"10": "brambling",
"11": "goldfinch",
"12": "house finch",
"13": "junco",
"14": "indigo bunting",
"15": "robin",
"16": "bulbul",
"17": "jay",
"18": "magpie",
"19": "chickadee",
"20": "water ouzel",
"21": "kite",
"22": "bald eagle",
"23": "vulture",
"24": "great grey owl",
"25": "european fire salamander",
"26": "common newt",
"27": "eft",
"28": "spotted salamander",
"29": "axolotl",
"30": "bullfrog",
"31": "tree frog",
"32": "tailed frog",
"33": "loggerhead",
"34": "leatherback turtle",
"35": "mud turtle",
"36": "terrapin",
"37": "box turtle",
"38": "banded gecko",
"39": "common iguana",
"40": "american chameleon",
"41": "whiptail",
"42": "agama",
"43": "frilled lizard",
"44": "alligator lizard",
"45": "gila monster",
"46": "green lizard",
"47": "african chameleon",
"48": "komodo dragon",
"49": "african crocodile"
}
def classify_imagenet_50(image):
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
prediction = {
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
}
return prediction
# Gradio Interface
iface = gr.Interface(
fn=classify_imagenet_50,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(num_top_classes=5, label="ImageNet-50 Classification"),
title="imagenet-50-subset",
description="Upload an image to classify it into one of 50 selected ImageNet categories."
)
if __name__ == "__main__":
iface.launch()
imagenet-50-subset can be used for:
Base model
google/siglip2-base-patch16-224