mfidabel's picture
update tag to enable widget (#1)
fa02041
metadata
license: creativeml-openrail-m
base_model: runwayml/stable-diffusion-v1-5
tags:
  - stable-diffusion
  - stable-diffusion-diffusers
  - diffusers
  - controlnet
  - jax-diffusers-event
  - image-to-image
inference: true
datasets:
  - mfidabel/sam-coyo-2k
  - mfidabel/sam-coyo-2.5k
  - mfidabel/sam-coyo-3k
language:
  - en
library_name: diffusers

ControlNet - mfidabel/controlnet-segment-anything

These are controlnet weights trained on runwayml/stable-diffusion-v1-5 with a new type of conditioning. You can find some example images in the following.

prompt: contemporary living room of a house

negative prompt: low quality images_0)

prompt: new york buildings, Vincent Van Gogh starry night

negative prompt: low quality, monochrome images_1)

prompt: contemporary living room, high quality, 4k, realistic

negative prompt: low quality, monochrome, low res images_2)

Model Details

  • Model type: Diffusion-based text-to-image generation model with ControlNet conditioning

  • Language(s): English

  • License: The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.

  • Model Description: This model is used to generate images based on a text prompt and a segmentation map as a template for the generated images

Limitations and Bias

  • The model can't render text
  • Landscapes with fewer segments tend to render better
  • Some segmentation maps tend to render in monochrome (use a negative_prompt to get around it)
  • Some generated images can be over saturated
  • Shorter prompts usually work better, as long as it makes sense with the input segmentation map
  • The model is biased to produce more paintings images rather than realistic images, as there are a lot of paintings in the training dataset

Training

Training Data This model was trained using a Segmented dataset based on the COYO-700M Dataset. Stable Diffusion v1.5 checkpoint was used as the base model for the controlnet.

You can obtain the Segmentation Map of any Image through this Colab: Open in Colab

The model was trained as follows:

In that particular order.

Training Details

  • Hardware: Google Cloud TPUv4-8 VM

  • Optimizer: AdamW

  • Train Batch Size: 2 x 4 = 8

  • Learning rate: 0.00001 constant

  • Gradient Accumulation Steps: 1

  • Resolution: 512

Environmental Impact

Based on the Machine Learning Emissions Calculator with the following characteristics:

  • Hardware Type: TPUv3 Chip (TPUv4 wasn't available yet at the time of calculating)
  • Training Hours: 8 hours
  • Cloud Provider: Google Cloud Platform
  • Compute Region: us-central1
  • Carbon Emitted (Power consumption x Time x Carbon Produced Based on the Local Power Grid): 283W x 8h = 2.26 kWh x 0.57 kg eq. CO2/kWh = 1.29 kg eq. CO2