Spaces:

ttoosi
/

GenerativeInferenceDemo

Sleeping

App Files Files Community

ttoosi commited on Mar 27

Commit

7449d44

verified ·

1 Parent(s): f7dbbd9

Upload 11 files

Browse files

direct initial upload

Files changed (12) hide show

.gitattributes +2 -0
Dockerfile +31 -0
LICENSE +21 -0
README.md +77 -8
app.py +112 -4
huggingface-metadata.json +11 -0
inference.py +251 -0
requirements.txt +9 -0
stimuli/Kanizsa_square.jpg +0 -0
stimuli/NeonColorSaeedi.jpg +3 -0
stimuli/face_vase.png +0 -0
stimuli/figure_ground.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+stimuli/figure_ground.png filter=lfs diff=lfs merge=lfs -text
+stimuli/NeonColorSaeedi.jpg filter=lfs diff=lfs merge=lfs -text

Dockerfile ADDED Viewed

	@@ -0,0 +1,31 @@

+FROM python:3.9-slim
+WORKDIR /code
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    git \
+    libgl1-mesa-glx \
+    libglib2.0-0 \
+    && rm -rf /var/lib/apt/lists/*
+# Copy only requirements to leverage Docker caching
+COPY ./requirements.txt /code/requirements.txt
+# Install Python dependencies
+RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
+# Copy all code and data
+COPY . /code/
+# Create necessary directories
+RUN mkdir -p /code/models
+RUN mkdir -p /code/stimuli
+# Make sure stimuli and models are writable
+RUN chmod -R 777 /code/models
+RUN chmod -R 777 /code/stimuli
+# Set up the command to run the app
+CMD ["python", "app.py"]

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2023 GenerativeInferenceDemo
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,14 +1,83 @@
 ---
-title: GenerativeInferenceDemo
-emoji: 🚀
-colorFrom: yellow
-colorTo: yellow
 sdk: gradio
-sdk_version: 5.23.1
 app_file: app.py
 pinned: false
-license: apache-2.0
-short_description: Generative Inference enables ai to see illusions out-of-box
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Generative Inference Demo
+emoji: 🧠
+colorFrom: indigo
+colorTo: purple
 sdk: gradio
+sdk_version: 3.50.2
 app_file: app.py
 pinned: false
+license: mit
 ---
+# Generative Inference Demo
+This Gradio demo showcases how neural networks perceive visual illusions through generative inference. The demo uses both standard and robust ResNet50 models to reveal emergent perception of contours, figure-ground separation, and other visual phenomena.
+## Models
+- **Robust ResNet50**: A model trained with adversarial examples (ε=3.0), exhibiting more human-like visual perception
+- **Standard ResNet50**: A model trained without adversarial examples (ε=0.0)
+## Features
+- Upload your own images or use example illusions
+- Choose between robust and standard models
+- Adjust perturbation size (epsilon) and iteration count
+- Visualize how perception emerges over time
+- Includes classic illusions:
+  - Kanizsa shapes
+  - Face-Vase illusions
+  - Figure-Ground segmentation
+  - Neon color spreading
+## Usage
+1. Select an example image or upload your own
+2. Choose the model type (robust or standard)
+3. Adjust epsilon and iteration parameters
+4. Click "Run Inference" to see how the model perceives the image
+## About
+This demo is based on research showing how adversarially robust models develop more human-like visual representations. The generative inference process reveals these perceptual biases by optimizing the input to maximize the model's confidence.
+## Installation
+To run this demo locally:
+```bash
+# Clone the repository
+git clone [repo-url]
+cd GenerativeInferenceDemo
+# Install dependencies
+pip install -r requirements.txt
+# Run the app
+python app.py
+```
+The web app will be available at http://localhost:7860 (or another port if 7860 is busy).
+## About the Models
+- **Robust ResNet50**: A model trained with adversarial examples, making it more robust to small perturbations. These models often exhibit more human-like visual perception.
+- **Standard ResNet50**: A standard ImageNet-trained ResNet50 model.
+## How It Works
+1. The algorithm starts with an input image
+2. It iteratively updates the image to increase the model's confidence in its predictions
+3. These updates are constrained to a small neighborhood (controlled by epsilon) around the original image
+4. The resulting changes reveal how the network "sees" the image
+## Citation
+If you use this work in your research, please cite the original paper:
+[Citation information will be added here]
+## License
+This project is licensed under the MIT License - see the LICENSE file for details.

app.py CHANGED Viewed

@@ -1,7 +1,115 @@
 import gradio as gr
-def greet(name):
-    return "Hello " + name + "!!"
-demo = gr.Interface(fn=greet, inputs="text", outputs="text")
-demo.launch()

 import gradio as gr
+import torch
+import numpy as np
+from PIL import Image
+import os
+import argparse
+from inference import GenerativeInferenceModel, get_inference_configs
+# Parse command line arguments
+parser = argparse.ArgumentParser(description='Run Generative Inference Demo')
+parser.add_argument('--port', type=int, default=7860, help='Port to run the server on')
+args = parser.parse_args()
+# Create model directories if they don't exist
+os.makedirs("models", exist_ok=True)
+os.makedirs("stimuli", exist_ok=True)
+# Initialize model
+model = GenerativeInferenceModel()
+def run_inference(image, model_type, illusion_type, eps_value, num_iterations):
+    # Convert eps to float
+    eps = float(eps_value)
+    # Load inference configuration
+    config = get_inference_configs(eps=eps, n_itr=int(num_iterations))
+    # Run generative inference
+    output_images, all_steps = model.inference(image, model_type, config)
+    # Create animation frames
+    frames = []
+    for i, step_image in enumerate(all_steps):
+        # Convert tensor to PIL image
+        step_pil = Image.fromarray((step_image.permute(1, 2, 0).cpu().numpy() * 255).astype(np.uint8))
+        frames.append(step_pil)
+    # Return the final inferred image and the animation
+    return output_images, gr.Gallery.update(value=frames)
+# Define the interface
+with gr.Blocks(title="Generative Inference Demo") as demo:
+    gr.Markdown("# Generative Inference Demo")
+    gr.Markdown("This demo showcases how neural networks can perceive visual illusions through generative inference.")
+    with gr.Row():
+        with gr.Column(scale=1):
+            # Inputs
+            image_input = gr.Image(label="Upload Image or Select an Illusion", type="pil")
+            with gr.Row():
+                model_choice = gr.Dropdown(
+                    choices=["robust_resnet50", "standard_resnet50"],
+                    value="robust_resnet50",
+                    label="Model"
+                )
+                illusion_type = gr.Dropdown(
+                    choices=["Kanizsa", "Face-Vase", "Neon-Color", "Figure-Ground"],
+                    value="Kanizsa",
+                    label="Illusion Type"
+                )
+            with gr.Row():
+                eps_slider = gr.Slider(minimum=0.01, maximum=3.0, value=0.5, step=0.01, label="Epsilon (Perturbation Size)")
+                iterations_slider = gr.Slider(minimum=10, maximum=200, value=50, step=10, label="Number of Iterations")
+            run_button = gr.Button("Run Inference")
+        with gr.Column(scale=2):
+            # Outputs
+            output_image = gr.Image(label="Final Inferred Image")
+            output_frames = gr.Gallery(label="Inference Steps", columns=4, rows=2)
+    # Set up example images
+    examples = [
+        [os.path.join("stimuli", "Kanizsa_square.jpg"), "robust_resnet50", "Kanizsa", 0.5, 50],
+        [os.path.join("stimuli", "face_vase.png"), "robust_resnet50", "Face-Vase", 0.5, 50],
+        [os.path.join("stimuli", "figure_ground.png"), "robust_resnet50", "Figure-Ground", 0.7, 100],
+        [os.path.join("stimuli", "NeonColorSaeedi.jpg"), "robust_resnet50", "Neon-Color", 0.3, 80]
+    ]
+    gr.Examples(examples=examples, inputs=[image_input, model_choice, illusion_type, eps_slider, iterations_slider])
+    # Set up event handler
+    run_button.click(
+        fn=run_inference,
+        inputs=[image_input, model_choice, illusion_type, eps_slider, iterations_slider],
+        outputs=[output_image, output_frames]
+    )
+    # Include a description of the technique
+    gr.Markdown("""
+    ## About Generative Inference
+    Generative inference is a technique that reveals how neural networks perceive visual stimuli by optimizing the input
+    to increase the network's confidence in its predictions. This process can reveal emergent perception of contours,
+    figure-ground separation, and other visual phenomena similar to human perception.
+    This demo allows you to:
+    1. Upload your own images or select from example illusions
+    2. Choose between robust or standard models
+    3. Adjust parameters like perturbation size (epsilon) and number of iterations
+    4. Visualize how the perception emerges over time
+    """)
+# Launch the demo with specific settings
+if __name__ == "__main__":
+    print(f"Starting server on port {args.port}")
+    demo.launch(
+        server_name="0.0.0.0",  # Listen on all interfaces
+        server_port=args.port,  # Use the port from command line arguments
+        share=False,
+        debug=True
+    )

huggingface-metadata.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "title": "Generative Inference Demo",
+  "emoji": "🧠",
+  "colorFrom": "indigo",
+  "colorTo": "purple",
+  "sdk": "gradio",
+  "sdk_version": "3.32.0",
+  "app_file": "app.py",
+  "pinned": false,
+  "license": "mit"
+}

inference.py ADDED Viewed

	@@ -0,0 +1,251 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torchvision.models as models
+import torchvision.transforms as transforms
+from torchvision.models.resnet import ResNet50_Weights
+from PIL import Image
+import numpy as np
+import os
+import requests
+import time
+from pathlib import Path
+# Check CUDA availability
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+print(f"Using device: {device}")
+# Constants
+MODEL_URLS = {
+    'robust_resnet50': 'https://huggingface.co/madrylab/robust-imagenet-models/resolve/main/resnet50_l2_eps_3.0.pt',
+    'standard_resnet50': 'https://huggingface.co/madrylab/robust-imagenet-models/resolve/main/resnet50_l2_eps_0.0.pt'
+}
+IMAGENET_MEAN = [0.485, 0.456, 0.406]
+IMAGENET_STD = [0.229, 0.224, 0.225]
+# Default transform
+transform = transforms.Compose([
+    transforms.Resize(224),
+    transforms.CenterCrop(224),
+    transforms.ToTensor(),
+])
+normalize_transform = transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD)
+# Get ImageNet labels
+def get_imagenet_labels():
+    url = "https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json"
+    response = requests.get(url)
+    if response.status_code == 200:
+        return response.json()
+    else:
+        raise RuntimeError("Failed to fetch ImageNet labels")
+# Download model if needed
+def download_model(model_type):
+    if model_type not in MODEL_URLS or MODEL_URLS[model_type] is None:
+        return None  # Use PyTorch's pretrained model
+    model_path = Path(f"models/{model_type}.pt")
+    if not model_path.exists():
+        print(f"Downloading {model_type} model...")
+        url = MODEL_URLS[model_type]
+        response = requests.get(url, stream=True)
+        if response.status_code == 200:
+            with open(model_path, 'wb') as f:
+                for chunk in response.iter_content(chunk_size=8192):
+                    f.write(chunk)
+            print(f"Model downloaded and saved to {model_path}")
+        else:
+            raise RuntimeError(f"Failed to download model: {response.status_code}")
+    return model_path
+class NormalizeByChannelMeanStd(nn.Module):
+    def __init__(self, mean, std):
+        super(NormalizeByChannelMeanStd, self).__init__()
+        if not isinstance(mean, torch.Tensor):
+            mean = torch.tensor(mean)
+        if not isinstance(std, torch.Tensor):
+            std = torch.tensor(std)
+        self.register_buffer("mean", mean)
+        self.register_buffer("std", std)
+    def forward(self, tensor):
+        return self.normalize_fn(tensor, self.mean, self.std)
+    def normalize_fn(self, tensor, mean, std):
+        """Differentiable version of torchvision.functional.normalize"""
+        # here we assume the color channel is at dim=1
+        mean = mean[None, :, None, None]
+        std = std[None, :, None, None]
+        return tensor.sub(mean).div(std)
+class InferStep:
+    def __init__(self, orig_image, eps, step_size):
+        self.orig_image = orig_image
+        self.eps = eps
+        self.step_size = step_size
+    def project(self, x):
+        diff = x - self.orig_image
+        diff = torch.clamp(diff, -self.eps, self.eps)
+        return torch.clamp(self.orig_image + diff, 0, 1)
+    def step(self, x, grad):
+        l = len(x.shape) - 1
+        grad_norm = torch.norm(grad.view(grad.shape[0], -1), dim=1).view(-1, *([1]*l))
+        scaled_grad = grad / (grad_norm + 1e-10)
+        return scaled_grad * self.step_size
+def get_inference_configs(eps=0.5, n_itr=50):
+    """Generate inference configuration with customizable parameters."""
+    config = {
+        'loss_infer': 'IncreaseConfidence',  # How to guide the optimization
+        'loss_function': 'CE',  # Loss function: Cross Entropy
+        'n_itr': n_itr,  # Number of iterations
+        'eps': eps,  # Maximum perturbation size
+        'step_size': 0.02,  # Step size for each iteration
+        'diffusion_noise_ratio': 0.0,  # No diffusion noise
+        'initial_inference_noise_ratio': 0.0,  # No initial noise
+        'top_layer': 'all',  # Use all layers of the model
+        'inference_normalization': 'on',  # Apply normalization during inference
+        'recognition_normalization': 'on',  # Apply normalization during recognition
+        'iterations_to_show': [1, 5, 10, 20, 30, 40, 50, n_itr]  # Specific iterations to visualize
+    }
+    return config
+class GenerativeInferenceModel:
+    def __init__(self):
+        self.models = {}
+        self.normalizer = NormalizeByChannelMeanStd(IMAGENET_MEAN, IMAGENET_STD).to(device)
+        self.labels = get_imagenet_labels()
+    def load_model(self, model_type):
+        if model_type in self.models:
+            return self.models[model_type]
+        model_path = download_model(model_type)
+        # Create standard ResNet50 model
+        model = models.resnet50()
+        # Load the model checkpoint
+        if model_path:
+            print(f"Loading {model_type} model from {model_path}...")
+            checkpoint = torch.load(model_path, map_location=device)
+            # Handle different checkpoint formats
+            if 'model' in checkpoint:
+                # Format from madrylab robust models
+                state_dict = checkpoint['model']
+            elif 'state_dict' in checkpoint:
+                state_dict = checkpoint['state_dict']
+            else:
+                # Direct state dict
+                state_dict = checkpoint
+            # Handle prefix in state dict keys
+            new_state_dict = {}
+            for key, value in state_dict.items():
+                if key.startswith('module.'):
+                    new_key = key[7:]  # Remove 'module.' prefix
+                else:
+                    new_key = key
+                new_state_dict[new_key] = value
+            model.load_state_dict(new_state_dict)
+        else:
+            # Fallback to PyTorch's pretrained model
+            model = models.resnet50(weights=ResNet50_Weights.IMAGENET1K_V1)
+        model = model.to(device)
+        model.eval()  # Set to evaluation mode
+        # Store the model for future use
+        self.models[model_type] = model
+        return model
+    def inference(self, image, model_type, config):
+        # Load model if not already loaded
+        model = self.load_model(model_type)
+        # Check if image is a file path
+        if isinstance(image, str):
+            if os.path.exists(image):
+                image = Image.open(image).convert('RGB')
+            else:
+                raise ValueError(f"Image path does not exist: {image}")
+        # Prepare image tensor
+        image_tensor = transform(image).unsqueeze(0).to(device)
+        image_tensor.requires_grad = True
+        # Normalize the image for model input
+        normalized_tensor = normalize_transform(image_tensor)
+        # Get original predictions
+        with torch.no_grad():
+            output_original = model(normalized_tensor)
+            probs_orig = F.softmax(output_original, dim=1)
+            conf_orig, classes_orig = torch.max(probs_orig, 1)
+            # Get least confident classes
+            _, least_confident_classes = torch.topk(probs_orig, k=100, largest=False)
+        # Initialize inference step
+        infer_step = InferStep(image_tensor, config['eps'], config['step_size'])
+        # Storage for inference steps
+        x = image_tensor.clone()
+        all_steps = [image_tensor[0].detach().cpu()]
+        # Main inference loop
+        for i in range(config['n_itr']):
+            # Reset gradients
+            x.grad = None
+            # Normalize input for the model
+            normalized_x = normalize_transform(x)
+            # Forward pass
+            output = model(normalized_x)
+            # Calculate loss to maximize confidence for least confident classes
+            target_classes = least_confident_classes[:10]  # Use top 10 least confident classes
+            loss = 0
+            for idx in target_classes:
+                target = torch.tensor([idx.item()], device=device)
+                loss = loss - F.cross_entropy(output, target)  # Negative because we want to maximize confidence
+            # Backward pass
+            loss.backward()
+            # Update image
+            with torch.no_grad():
+                step = infer_step.step(x, x.grad)
+                x = x + step
+                x = infer_step.project(x)
+            # Store step if in iterations_to_show
+            if i+1 in config['iterations_to_show'] or i+1 == config['n_itr']:
+                all_steps.append(x[0].detach().cpu())
+        # Return final image and all stored steps
+        return x[0].detach().cpu(), all_steps
+# Utility function to show inference steps
+def show_inference_steps(steps, figsize=(15, 10)):
+    import matplotlib.pyplot as plt
+    n_steps = len(steps)
+    fig, axes = plt.subplots(1, n_steps, figsize=figsize)
+    for i, step_img in enumerate(steps):
+        img = step_img.permute(1, 2, 0).numpy()
+        axes[i].imshow(img)
+        axes[i].set_title(f"Step {i}")
+        axes[i].axis('off')
+    plt.tight_layout()
+    return fig

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+torch
+torchvision
+numpy
+pillow
+gradio
+matplotlib
+requests
+tqdm
+huggingface_hub

stimuli/Kanizsa_square.jpg ADDED Viewed

stimuli/NeonColorSaeedi.jpg ADDED Viewed

Git LFS Details

SHA256: ed0d51b349cfadd27fe9e683217e0bd83f45d23f86919022b56ba7c48463080d
Pointer size: 131 Bytes
Size of remote file: 839 kB

stimuli/face_vase.png ADDED Viewed

stimuli/figure_ground.png ADDED Viewed

Git LFS Details

SHA256: b366b9e23a3527f1587ed08df3abe3b74dc3410d8ff5a02daa652d364b8f2238
Pointer size: 131 Bytes
Size of remote file: 297 kB