--- tags: - vision - clip - fine-tuned - PatchCamelyon - medical-imaging license: apache-2.0 library_name: transformers model_type: clip_vision_model datasets: - 1aurent/PatchCamelyon - lens-ai/adversarial_pcam --- # ![LensAI Logo](https://static.wixstatic.com/media/a8a410_27dc826bddd34fb8a464a8434c53ab87~mv2.png/v1/fill/w_350,h_100,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/logolai.png) # Adversarial CLIP ViT Base Patch32 Fine-Tuned on PatchCamelyon (PCAM) ## Overview -This repository contains a model trained on adversarial data of the [CLIP ViT Base Patch32 finetuned](https://huggingface.co/lens-ai/clip-vit-base-patch32_pcam_finetuned) model on the [PatchCamelyon (PCAM)](https://huggingface.co/datasets/1aurent/PatchCamelyon) dataset and also on [PatchCamelyon Adversarial(PCAM)](https://huggingface.co/datasets/lens-ai/adversarial_pcam) dataset.The model is optimized for histopathological image classification. ## 📌 Model Highlights - **Model Type:** CLIP Vision Transformer (ViT-B/32) with classification head - **Task:** Binary classification of histopathological images (cancer vs. non-cancer) - **Base Model:** `openai/clip-vit-base-patch32` - **Training Data:** PatchCamelyon (PCAM) and Adversarial PCAM datasets - **Input:** RGB images (224x224 pixels) - **Output:** Binary classification (cancer/non-cancer) ## 🚀 Key Results ### ✅ Clean Evaluation Metrics - **Clean Accuracy:** 86.72% ### ⚔️ Adversarial Robustness (Fine-tuned Model) - **PGD Attack:** - Success Rate: 17.87% - Average L2 Distance: 12.09 - **FGSM Attack:** - Success Rate: 17.38% - Average L2 Distance: 12.10 - **DeepFool Attack:** - Success Rate: 35.62% - Average L2 Distance: 234.13 ### 📊 Base Model Comparison - **Clean Accuracy:** 86.30% - **PGD:** 50.10% Success Rate | Avg L2 Distance: 12.08 - **FGSM:** 44.14% Success Rate | Avg L2 Distance: 12.10 - **DeepFool:** 81.64% Success Rate | Avg L2 Distance: 224.66 **Hardware:** Trained on NVIDIA A100 GPU (5 epochs) --- ## 🔧 Usage ### Installation ```bash pip install transformers torch safetensors ``` ### Inference Example ```python from transformers import CLIPVisionConfig, CLIPVisionModel, CLIPFeatureExtractor import torch from torch import nn class PCamClassifier(nn.Module): def __init__(self, config_dict): super().__init__() self.config = CLIPVisionConfig(**config_dict) self.vision_model = CLIPVisionModel(self.config) self.classifier = nn.Linear(self.config.hidden_size, 2) def forward(self, pixel_values): outputs = self.vision_model(pixel_values) return self.classifier(outputs.pooler_output) # Load model config_dict = { "_name_or_path": "openai/clip-vit-base-patch32", "architectures": ["CLIPVisionModel"], "attention_dropout": 0.0, "dropout": 0.0, "hidden_act": "quick_gelu", "hidden_size": 768, "image_size": 224, "initializer_factor": 1.0, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-05, "model_type": "clip_vision_model", "num_attention_heads": 12, "num_channels": 3, "num_hidden_layers": 12, "patch_size": 32, "projection_dim": 512, "torch_dtype": "float32" } # Initialize model model = PCamClassifier(config_dict) model.load_state_dict(torch.load('best_enhanced_pcam_model.pt')) class PCamDataset(Dataset): def __init__(self, dataset): self.dataset = dataset def __len__(self): return len(self.dataset) def __getitem__(self, idx): example = self.dataset[idx] image = example["image"].convert("RGB") image_array = np.array(image) / 255.0 image_array = image_array.transpose(2, 0, 1).astype(np.float32) return { "pixel_values": image_array, "labels": example["label"] } ``` --- ## 📊 Future Work We plan to release: - Enhanced robustness metrics - Expanded adversarial attack evaluations ## 📜 License Released under the Apache-2.0 License. ## 📬 Contact For inquiries, please reach out to **Venkata Tej** at [LensAI](https://www.lensai.tech).