Image Classification
Transformers
Safetensors
English
siglip
ImageNet
SigLIP2
Classifier
IMAGENETTE / README.md
prithivMLmods's picture
Update README.md
216f2c3 verified
---
license: apache-2.0
datasets:
- frgfm/imagenette
language:
- en
base_model:
- google/siglip2-base-patch16-224
pipeline_tag: image-classification
library_name: transformers
tags:
- ImageNet
- SigLIP2
- Classifier
---
![3.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/XKcsM33R3XKl5JBBfQHNM.png)
# IMAGENETTE
> IMAGENETTE is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to classify images into 10 categories from the popular Imagenette dataset using the SiglipForImageClassification architecture.
> [!note]
*SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features* https://arxiv.org/pdf/2502.14786
> [!note]
> *ImageNet Large Scale Visual Recognition Challenge* https://arxiv.org/pdf/1409.0575
```py
Classification Report:
precision recall f1-score support
tench 0.9885 0.9834 0.9859 963
english springer 0.9843 0.9822 0.9832 955
cassette player 0.9544 0.9486 0.9515 993
chain saw 0.9257 0.8998 0.9125 858
church 0.9654 0.9798 0.9726 941
French horn 0.9757 0.9665 0.9711 956
garbage truck 0.8883 0.9761 0.9301 961
gas pump 0.9366 0.9044 0.9202 931
golf ball 0.9925 0.9716 0.9819 951
parachute 0.9821 0.9708 0.9764 960
accuracy 0.9590 9469
macro avg 0.9593 0.9583 0.9586 9469
weighted avg 0.9597 0.9590 0.9591 9469
```
![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/74PN9tMCvZIfg_qegVOa9.png)
---
## Label Space: 10 Classes
The model predicts one of the following image classes:
```
0: tench
1: english springer
2: cassette player
3: chain saw
4: church
5: French horn
6: garbage truck
7: gas pump
8: golf ball
9: parachute
```
---
## Install Dependencies
```bash
pip install -q transformers torch pillow gradio hf_xet
```
---
## Inference Code
```python
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/IMAGENETTE"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# Label mapping
id2label = {
"0": "tench",
"1": "english springer",
"2": "cassette player",
"3": "chain saw",
"4": "church",
"5": "French horn",
"6": "garbage truck",
"7": "gas pump",
"8": "golf ball",
"9": "parachute"
}
def classify_image(image):
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
prediction = {
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
}
return prediction
# Gradio Interface
iface = gr.Interface(
fn=classify_image,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(num_top_classes=3, label="Image Classification"),
title="IMAGENETTE - SigLIP2 Classifier",
description="Upload an image to classify it into one of 10 categories from the Imagenette dataset."
)
if __name__ == "__main__":
iface.launch()
```
---
## Intended Use
IMAGENETTE is designed for:
* Educational purposes and model benchmarking.
* Demonstrating the performance of SigLIP2 on a small but diverse classification task.
* Fine-tuning workflows on vision-language models.