SigLIP2 070225, 070125
Collection
new exps
โข
3 items
โข
Updated
Food-or-Not-SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for binary image classification. It is trained to distinguish between images of food and non-food objects using the SiglipForImageClassification architecture.
Classification Report:
precision recall f1-score support
food 0.8902 0.8610 0.8753 4000
not-food 0.8654 0.8938 0.8794 4000
accuracy 0.8774 8000
macro avg 0.8778 0.8774 0.8773 8000
weighted avg 0.8778 0.8774 0.8773 8000
The model classifies each image into one of the following categories:
Class 0: "food"
Class 1: "not-food"
pip install -q transformers torch pillow gradio
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/Food-or-Not-SigLIP2" # Replace with your model path if different
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# Label mapping
id2label = {
"0": "food",
"1": "not-food"
}
def classify_food(image):
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
prediction = {
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
}
return prediction
# Gradio Interface
iface = gr.Interface(
fn=classify_food,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(num_top_classes=2, label="Food Classification"),
title="Food-or-Not-SigLIP2",
description="Upload an image to detect if it contains food or not."
)
if __name__ == "__main__":
iface.launch()
Food-or-Not-SigLIP2 can be used for:
Base model
google/siglip2-base-patch16-224