Model Description

Developed by: Alex Reid
Model type: Diffusion-based text-to-image generative model
License: Apache 2.0
Model Description: A foundational diffusion model trained entirely on tile-able (seamless) surface print patterns. Based on the architecture of stable-diffusion-2-base.

Overview

A major weak point of state-of-the-art image generation models continues to be seamless (repeating/tile-able) images, particularly in cases where the image needs to appear completely flat and avoid depth, such as for product surfaces, textile printing, wallpaper and more. To overcome this, Pattern Diffusion was trained from scratch on approximately 6.8m tile-able patterns.

Compared to a full-scale diffusion model such as SDXL or FLUX, unet diffusion models require significantly less data and compute to train when all images have repeating patterns/features. Pattern Diffusion was trained in under 1000 GPU-hours on 8xA100 at a batch size of 2048, for a total of 65,000 steps. Training was done in 4 stages, starting at 256x256 and increasing by 256px after each stage (256, 512, 768, 1024), until FID and CLIP scores stopped improving at each stage.

Also available below is an example inference implementation that produces optimal results for tile-able image generation by combining both noise rolling and circular padding on conv2D layers.

Commercial Use

Pattern Diffusion is released under Apache 2.0 license and is available for both research and commercial use with no attribution required.

Strong Areas

The model is excellent at generating floral and abstract patterns
Strong understanding of foreground and background colors in the prompt
Prompts can be a mix of very random concepts and often produce beautiful results
Fast inference speeds and low VRAM requirements

Limitations

Cannot generate coherent text
Struggles with anatomically correct living creatures due to the limited size of the dataset, often producing incorrect numbers of limbs or mirrored bodies
Does work for simple geometric patterns (such as checkerboard) but frequently produces inconsistent geometry

Example Usage

Below is an example script that produces the best scoring (CLIP and FID) results while having no visible seams in generated images. Most public techniques for making seamless images with diffusion models involves setting all conv2d layers to use circular padding, however upon testing the effects of this, it significantly harms FID and CLIP scores both on Pattern Diffusion and other models such as Stable Diffusion 1.5 and SDXL. This can be overcome by only enabling circular padding in the late steps of the diffusion process after the majority of the features have already been denoised. However when doing this, noise rolling must be implemented from the start of inference to ensure any prominent features are made seamless across the image border. With both noise rolling and late-stage circular conv2d padding, there is no measurable decrease in FID or CLIP scores from the unmodified inference setup.

import torch
from torch import Tensor
import torch.nn as nn
from torch.nn import Conv2d
from torch.nn import functional as F
from torch.nn.modules.utils import _pair
from typing import Optional
from diffusers import StableDiffusionPipeline, DDPMScheduler
import diffusers
from PIL import Image


def asymmetricConv2DConvForward_circular(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]):
    self.paddingX = (
        self._reversed_padding_repeated_twice[0],
        self._reversed_padding_repeated_twice[1],
        0,
        0
    )

    self.paddingY = (
        0,
        0,
        self._reversed_padding_repeated_twice[2],
        self._reversed_padding_repeated_twice[3]
    )
    working = F.pad(input, self.paddingX, mode="circular")
    working = F.pad(working, self.paddingY, mode="circular")

    return F.conv2d(working, weight, bias, self.stride, _pair(0), self.dilation, self.groups)


# Sets the padding mode to circular on Conv2d
def make_seamless(model):
    for module in model.modules():
        if isinstance(module, torch.nn.Conv2d):
            if isinstance(module, diffusers.models.lora.LoRACompatibleConv) and module.lora_layer is None:
                module.lora_layer = lambda *x: 0
            module._conv_forward = asymmetricConv2DConvForward_circular.__get__(module, Conv2d)


# Sets the padding mode back to default on Conv2d
def disable_seamless(model):
    for module in model.modules():
        if isinstance(module, torch.nn.Conv2d):
            if isinstance(module, diffusers.models.lora.LoRACompatibleConv) and module.lora_layer is None:
                module.lora_layer = lambda *x: 0
            module._conv_forward = nn.Conv2d._conv_forward.__get__(module, Conv2d)


# Runs every inference step
def diffusion_callback(pipe, step_index, timestep, callback_kwargs):
    # Sets unet and VAE to have circular padding on conv2d for last 20% of steps
    if step_index == int(pipe.num_timesteps * 0.8):
        make_seamless(pipe.unet)
        make_seamless(pipe.vae)

    # Noise Rolling: For the first 80% of steps, this shifts the noise slightly and wraps around the edge
    if step_index < int(pipe.num_timesteps * 0.8):
        callback_kwargs["latents"] = torch.roll(callback_kwargs["latents"], shifts=(64, 64), dims=(2, 3))

    return callback_kwargs


pipe = StableDiffusionPipeline.from_pretrained(
    "Arrexel/pattern-diffusion",
    torch_dtype=torch.float16
).to("cuda")
pipe.scheduler = DDPMScheduler.from_config(pipe.scheduler.config)

# Make sure to disable circular padding on conv2d before starting inference as it should only be enabled in last 20% of steps
# This is not necessary if you are only generating a single image (as it is disabled by default when the pipe loads)
disable_seamless(pipe.unet)
disable_seamless(pipe.vae)
output = pipe(
    num_inference_steps=50,
    prompt="Vibrant watercolor floral pattern with pink, purple, and blue flowers against a white background.",
    width=1024,
    height=1024,
    callback_on_step_end=diffusion_callback
).images[0]
output.save("example.png")

Downloads last month: 972