prithivMLmods
/

IMAGENETTE

Image Classification

Model card Files Files and versions

IMAGENETTE / README.md

prithivMLmods's picture

Update README.md

216f2c3 verified 4 months ago

|

history blame contribute delete

3.87 kB

	---
	license: apache-2.0
	datasets:
	- frgfm/imagenette
	language:
	- en
	base_model:
	- google/siglip2-base-patch16-224
	pipeline_tag: image-classification
	library_name: transformers
	tags:
	- ImageNet
	- SigLIP2
	- Classifier
	---

	![3.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/XKcsM33R3XKl5JBBfQHNM.png)

	# IMAGENETTE

	> IMAGENETTE is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to classify images into 10 categories from the popular Imagenette dataset using the SiglipForImageClassification architecture.

	> [!note]
	SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786

	> [!note]
	> ImageNet Large Scale Visual Recognition Challenge https://arxiv.org/pdf/1409.0575

	```py
	Classification Report:
	precision recall f1-score support

	tench 0.9885 0.9834 0.9859 963
	english springer 0.9843 0.9822 0.9832 955
	cassette player 0.9544 0.9486 0.9515 993
	chain saw 0.9257 0.8998 0.9125 858
	church 0.9654 0.9798 0.9726 941
	French horn 0.9757 0.9665 0.9711 956
	garbage truck 0.8883 0.9761 0.9301 961
	gas pump 0.9366 0.9044 0.9202 931
	golf ball 0.9925 0.9716 0.9819 951
	parachute 0.9821 0.9708 0.9764 960

	accuracy 0.9590 9469
	macro avg 0.9593 0.9583 0.9586 9469
	weighted avg 0.9597 0.9590 0.9591 9469
	```

	![download.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/74PN9tMCvZIfg_qegVOa9.png)

	---

	## Label Space: 10 Classes

	The model predicts one of the following image classes:

	```
	0: tench
	1: english springer
	2: cassette player
	3: chain saw
	4: church
	5: French horn
	6: garbage truck
	7: gas pump
	8: golf ball
	9: parachute
	```

	---

	## Install Dependencies

	```bash
	pip install -q transformers torch pillow gradio hf_xet
	```

	---

	## Inference Code

	```python
	import gradio as gr
	from transformers import AutoImageProcessor, SiglipForImageClassification
	from PIL import Image
	import torch

	# Load model and processor
	model_name = "prithivMLmods/IMAGENETTE"
	model = SiglipForImageClassification.from_pretrained(model_name)
	processor = AutoImageProcessor.from_pretrained(model_name)

	# Label mapping
	id2label = {
	"0": "tench",
	"1": "english springer",
	"2": "cassette player",
	"3": "chain saw",
	"4": "church",
	"5": "French horn",
	"6": "garbage truck",
	"7": "gas pump",
	"8": "golf ball",
	"9": "parachute"
	}

	def classify_image(image):
	image = Image.fromarray(image).convert("RGB")
	inputs = processor(images=image, return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits
	probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

	prediction = {
	id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
	}

	return prediction

	# Gradio Interface
	iface = gr.Interface(
	fn=classify_image,
	inputs=gr.Image(type="numpy"),
	outputs=gr.Label(num_top_classes=3, label="Image Classification"),
	title="IMAGENETTE - SigLIP2 Classifier",
	description="Upload an image to classify it into one of 10 categories from the Imagenette dataset."
	)

	if __name__ == "__main__":
	iface.launch()
	```

	---

	## Intended Use

	IMAGENETTE is designed for:

	* Educational purposes and model benchmarking.
	* Demonstrating the performance of SigLIP2 on a small but diverse classification task.
	* Fine-tuning workflows on vision-language models.