SigLIP2 Content Filters 042025 Final
Collection
Moderation, Balance, Classifiers
•
7 items
•
Updated
•
2
PACS-DG-SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class domain generalization classification. It is trained to distinguish visual domains such as art paintings, cartoons, photos, and sketches using the SiglipForImageClassification architecture.
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786
Classification Report:
precision recall f1-score support
art_painting 0.8538 0.9380 0.8939 2048
cartoon 0.9891 0.9330 0.9603 2344
photo 0.9029 0.8635 0.8828 1670
sketch 0.9990 1.0000 0.9995 3929
accuracy 0.9488 9991
macro avg 0.9362 0.9336 0.9341 9991
weighted avg 0.9509 0.9488 0.9491 9991
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("flwrlabs/pacs")
# Extract unique masterCategory values (assuming it's a string field)
labels = sorted(set(example["domain"] for example in dataset["train"]))
# Create id2label mapping
id2label = {str(i): label for i, label in enumerate(labels)}
# Print the mapping
print(id2label)
The model predicts the most probable visual domain from the following:
Class 0: "art_painting"
Class 1: "cartoon"
Class 2: "photo"
Class 3: "sketch"
pip install -q transformers torch pillow gradio
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/PACS-DG-SigLIP2" # Update to your actual model path on Hugging Face
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# Label map
id2label = {
"0": "art_painting",
"1": "cartoon",
"2": "photo",
"3": "sketch"
}
def classify_pacs_image(image):
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
prediction = {
id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
}
return prediction
# Gradio Interface
iface = gr.Interface(
fn=classify_pacs_image,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(num_top_classes=4, label="Predicted Domain Probabilities"),
title="PACS-DG-SigLIP2",
description="Upload an image to classify its visual domain: Art Painting, Cartoon, Photo, or Sketch."
)
if __name__ == "__main__":
iface.launch()
The PACS-DG-SigLIP2 model is designed to support tasks in domain generalization, particularly:
Base model
google/siglip2-base-patch16-224