patrickvonplaten

williamberman commited on Mar 3, 2023

Commit

8e850e5

•

1 Parent(s): a77d9c8

update README (#1)

Browse files

- update README (1c52cfa04a0b5e35c158d6577b1e9ccdf61e0299)

Co-authored-by: Will Berman <[email protected]>

Files changed (24) hide show

README.md +18 -326
controlnet_utils.py +0 -40
images/bag_scribble_out.png +0 -0
images/bird.png +0 -3
images/bird_canny.png +0 -0
images/bird_canny_out.png +0 -0
images/chef_pose_out.png +0 -0
images/house.png +0 -0
images/house_seg.png +0 -0
images/house_seg_out.png +0 -0
images/man.png +0 -0
images/man_hed.png +0 -0
images/man_hed_out.png +0 -0
images/openpose.png +0 -0
images/pose.png +0 -0
images/room.png +0 -0
images/room_mlsd.png +0 -0
images/room_mlsd_out.png +0 -0
images/stormtrooper.png +0 -0
images/stormtrooper_depth.png +0 -0
images/stormtrooper_depth_out.png +0 -0
images/toy.png +0 -0
images/toy_normal.png +0 -0
images/toy_normal_out.png +0 -0

README.md CHANGED Viewed

@@ -18,301 +18,21 @@ Controlnet's auxiliary models are trained with stable diffusion 1.5. Experimenta
 The auxiliary conditioning is passed directly to the diffusers pipeline. If you want to process an image to create the auxiliary conditioning, external dependencies are required.
 Some of the additional conditionings can be extracted from images via additional models. We extracted these
-additional models from the original controlnet repo into a separate package that can be found on [github](https://github.com/patrickvonplaten/human_pose.git).
-## Canny edge detection
-Install opencv
-```sh
-$ pip install opencv-contrib-python
-```
-```python
-import cv2
-from PIL import Image
-from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
-import torch
-import numpy as np
-image = Image.open('images/bird.png')
-image = np.array(image)
-low_threshold = 100
-high_threshold = 200
-image = cv2.Canny(image, low_threshold, high_threshold)
-image = image[:, :, None]
-image = np.concatenate([image, image, image], axis=2)
-image = Image.fromarray(image)
-controlnet = ControlNetModel.from_pretrained(
-    "fusing/stable-diffusion-v1-5-controlnet-canny",
-)
-pipe = StableDiffusionControlNetPipeline.from_pretrained(
-    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
-)
-pipe.to('cuda')
-image = pipe("bird", image).images[0]
-image.save('images/bird_canny_out.png')
-```
-![bird](./images/bird.png)
-![bird_canny](./images/bird_canny.png)
-![bird_canny_out](./images/bird_canny_out.png)
-## M-LSD Straight line detection
-Install the additional controlnet models package.
-```sh
-$ pip install git+https://github.com/patrickvonplaten/human_pose.git
-```
-```py
-from PIL import Image
-from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
-import torch
-from human_pose import MLSDdetector
-mlsd = MLSDdetector.from_pretrained('lllyasviel/ControlNet')
-image = Image.open('images/room.png')
-image = mlsd(image)
-controlnet = ControlNetModel.from_pretrained(
-    "fusing/stable-diffusion-v1-5-controlnet-mlsd",
-)
-pipe = StableDiffusionControlNetPipeline.from_pretrained(
-    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
-)
-pipe.to('cuda')
-image = pipe("room", image).images[0]
-image.save('images/room_mlsd_out.png')
-```
-![room](./images/room.png)
-![room_mlsd](./images/room_mlsd.png)
-![room_mlsd_out](./images/room_mlsd_out.png)
-## Pose estimation
-Install the additional controlnet models package.
-```sh
-$ pip install git+https://github.com/patrickvonplaten/human_pose.git
-```
-```py
-from PIL import Image
-from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
-import torch
-from human_pose import OpenposeDetector
-openpose = OpenposeDetector.from_pretrained('lllyasviel/ControlNet')
-image = Image.open('images/pose.png')
-image = openpose(image)
-controlnet = ControlNetModel.from_pretrained(
-    "fusing/stable-diffusion-v1-5-controlnet-openpose",
-)
-pipe = StableDiffusionControlNetPipeline.from_pretrained(
-    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
-)
-pipe.to('cuda')
-image = pipe("chef in the kitchen", image).images[0]
-image.save('images/chef_pose_out.png')
-```
-![pose](./images/pose.png)
-![openpose](./images/openpose.png)
-![chef_pose_out](./images/chef_pose_out.png)
-## Semantic Segmentation
-Semantic segmentation relies on transformers. Transformers is a
-dependency of diffusers for running controlnet, so you should
-have it installed already.
-```py
-from transformers import AutoImageProcessor, UperNetForSemanticSegmentation
-from PIL import Image
-import numpy as np
-from controlnet_utils import ade_palette
-import torch
-from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
-image_processor = AutoImageProcessor.from_pretrained("openmmlab/upernet-convnext-small")
-image_segmentor = UperNetForSemanticSegmentation.from_pretrained("openmmlab/upernet-convnext-small")
-image = Image.open("./images/house.png").convert('RGB')
-pixel_values = image_processor(image, return_tensors="pt").pixel_values
-with torch.no_grad():
-  outputs = image_segmentor(pixel_values)
-seg = image_processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
-color_seg = np.zeros((seg.shape[0], seg.shape[1], 3), dtype=np.uint8) # height, width, 3
-palette = np.array(ade_palette())
-for label, color in enumerate(palette):
-    color_seg[seg == label, :] = color
-color_seg = color_seg.astype(np.uint8)
-image = Image.fromarray(color_seg)
-controlnet = ControlNetModel.from_pretrained(
-    "fusing/stable-diffusion-v1-5-controlnet-seg",
-)
-pipe = StableDiffusionControlNetPipeline.from_pretrained(
-    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
-)
-pipe.to('cuda')
-image = pipe("house", image).images[0]
-image.save('./images/house_seg_out.png')
-```
-![house](images/house.png)
-![house_seg](images/house_seg.png)
-![house_seg_out](images/house_seg_out.png)
-## Depth control
-Depth control relies on transformers. Transformers is a dependency of diffusers for running controlnet, so
-you should have it installed already.
-```py
-from transformers import pipeline
-from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
-from PIL import Image
-import numpy as np
-depth_estimator = pipeline('depth-estimation')
-image = Image.open('./images/stormtrooper.png')
-image = depth_estimator(image)['depth']
-image = np.array(image)
-image = image[:, :, None]
-image = np.concatenate([image, image, image], axis=2)
-image = Image.fromarray(image)
-controlnet = ControlNetModel.from_pretrained(
-    "fusing/stable-diffusion-v1-5-controlnet-depth",
-)
-pipe = StableDiffusionControlNetPipeline.from_pretrained(
-    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
-)
-pipe.to('cuda')
-image = pipe("Stormtrooper's lecture", image).images[0]
-image.save('./images/stormtrooper_depth_out.png')
-```
-![stormtrooper](./images/stormtrooper.png)
-![stormtrooler_depth](./images/stormtrooper_depth.png)
-![stormtrooler_depth_out](./images/stormtrooper_depth_out.png)
-## Normal map
-```py
-from PIL import Image
-from transformers import pipeline
-import numpy as np
-import cv2
-from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
-image = Image.open("images/toy.png").convert("RGB")
-depth_estimator = pipeline("depth-estimation", model ="Intel/dpt-hybrid-midas" )
-image = depth_estimator(image)['predicted_depth'][0]
-image = image.numpy()
-image_depth = image.copy()
-image_depth -= np.min(image_depth)
-image_depth /= np.max(image_depth)
-bg_threhold = 0.4
-x = cv2.Sobel(image, cv2.CV_32F, 1, 0, ksize=3)
-x[image_depth < bg_threhold] = 0
-y = cv2.Sobel(image, cv2.CV_32F, 0, 1, ksize=3)
-y[image_depth < bg_threhold] = 0
-z = np.ones_like(x) * np.pi * 2.0
-image = np.stack([x, y, z], axis=2)
-image /= np.sum(image ** 2.0, axis=2, keepdims=True) ** 0.5
-image = (image * 127.5 + 127.5).clip(0, 255).astype(np.uint8)
-image = Image.fromarray(image)
-controlnet = ControlNetModel.from_pretrained(
-    "fusing/stable-diffusion-v1-5-controlnet-normal",
-)
-pipe = StableDiffusionControlNetPipeline.from_pretrained(
-    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
-)
-pipe.to('cuda')
-image = pipe("cute toy", image).images[0]
-image.save('images/toy_normal_out.png')
-```
-![toy](./images/toy.png)
-![toy_normal](./images/toy_normal.png)
-![toy_normal_out](./images/toy_normal_out.png)
 ## Scribble
 Install the additional controlnet models package.
 ```sh
-$ pip install git+https://github.com/patrickvonplaten/human_pose.git
 ```
 ```py
 from PIL import Image
-from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
 import torch
-from human_pose import HEDdetector
 hed = HEDdetector.from_pretrained('lllyasviel/ControlNet')
@@ -321,15 +41,23 @@ image = Image.open('images/bag.png')
 image = hed(image, scribble=True)
 controlnet = ControlNetModel.from_pretrained(
-    "fusing/stable-diffusion-v1-5-controlnet-scribble",
 )
 pipe = StableDiffusionControlNetPipeline.from_pretrained(
-    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
 )
-pipe.to('cuda')
-image = pipe("bag", image).images[0]
 image.save('images/bag_scribble_out.png')
 ```
@@ -340,42 +68,6 @@ image.save('images/bag_scribble_out.png')
 ![bag_scribble_out](./images/bag_scribble_out.png)
-## HED Boundary
-Install the additional controlnet models package.
-```sh
-$ pip install git+https://github.com/patrickvonplaten/human_pose.git
-```
-```py
-from PIL import Image
-from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
-import torch
-from human_pose import HEDdetector
-hed = HEDdetector.from_pretrained('lllyasviel/ControlNet')
-image = Image.open('images/man.png')
-image = hed(image)
-controlnet = ControlNetModel.from_pretrained(
-    "fusing/stable-diffusion-v1-5-controlnet-hed",
-)
-pipe = StableDiffusionControlNetPipeline.from_pretrained(
-    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None
-)
-pipe.to('cuda')
-image = pipe("oil painting of handsome old man, masterpiece", image).images[0]
-image.save('images/man_hed_out.png')
-```
-![man](./images/man.png)
-![man_hed](./images/man_hed.png)
-![man_hed_out](./images/man_hed_out.png)

 The auxiliary conditioning is passed directly to the diffusers pipeline. If you want to process an image to create the auxiliary conditioning, external dependencies are required.
 Some of the additional conditionings can be extracted from images via additional models. We extracted these
+additional models from the original controlnet repo into a separate package that can be found on [github](https://github.com/patrickvonplaten/controlnet_aux.git).
 ## Scribble
 Install the additional controlnet models package.
 ```sh
+$ pip install git+https://github.com/patrickvonplaten/controlnet_aux.git
 ```
 ```py
 from PIL import Image
+from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
 import torch
+from controlnet_aux import HEDdetector
 hed = HEDdetector.from_pretrained('lllyasviel/ControlNet')
 image = hed(image, scribble=True)
 controlnet = ControlNetModel.from_pretrained(
+    "fusing/stable-diffusion-v1-5-controlnet-scribble", torch_dtype=torch.float16
 )
 pipe = StableDiffusionControlNetPipeline.from_pretrained(
+    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16
 )
+pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
+# Remove if you do not have xformers installed
+# see https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/xformers#installing-xformers
+# for installation instructions
+pipe.enable_xformers_memory_efficient_attention()
+pipe.enable_model_cpu_offload()
+image = pipe("bag", image, num_inference_steps=20).images[0]
 image.save('images/bag_scribble_out.png')
 ```
 ![bag_scribble_out](./images/bag_scribble_out.png)
+### Training
+The scribble model was trained on 500k scribble-image, caption pairs. The scribble images were generated with HED boundary detection and a set of data augmentations — thresholds, masking, morphological transformations, and non-maximum suppression. The model was trained for 150 GPU-hours with Nvidia A100 80G using the canny model as a base model.

controlnet_utils.py DELETED Viewed

@@ -1,40 +0,0 @@
-def ade_palette():
-    """ADE20K palette that maps each class to RGB values."""
-    return [[120, 120, 120], [180, 120, 120], [6, 230, 230], [80, 50, 50],
-            [4, 200, 3], [120, 120, 80], [140, 140, 140], [204, 5, 255],
-            [230, 230, 230], [4, 250, 7], [224, 5, 255], [235, 255, 7],
-            [150, 5, 61], [120, 120, 70], [8, 255, 51], [255, 6, 82],
-            [143, 255, 140], [204, 255, 4], [255, 51, 7], [204, 70, 3],
-            [0, 102, 200], [61, 230, 250], [255, 6, 51], [11, 102, 255],
-            [255, 7, 71], [255, 9, 224], [9, 7, 230], [220, 220, 220],
-            [255, 9, 92], [112, 9, 255], [8, 255, 214], [7, 255, 224],
-            [255, 184, 6], [10, 255, 71], [255, 41, 10], [7, 255, 255],
-            [224, 255, 8], [102, 8, 255], [255, 61, 6], [255, 194, 7],
-            [255, 122, 8], [0, 255, 20], [255, 8, 41], [255, 5, 153],
-            [6, 51, 255], [235, 12, 255], [160, 150, 20], [0, 163, 255],
-            [140, 140, 140], [250, 10, 15], [20, 255, 0], [31, 255, 0],
-            [255, 31, 0], [255, 224, 0], [153, 255, 0], [0, 0, 255],
-            [255, 71, 0], [0, 235, 255], [0, 173, 255], [31, 0, 255],
-            [11, 200, 200], [255, 82, 0], [0, 255, 245], [0, 61, 255],
-            [0, 255, 112], [0, 255, 133], [255, 0, 0], [255, 163, 0],
-            [255, 102, 0], [194, 255, 0], [0, 143, 255], [51, 255, 0],
-            [0, 82, 255], [0, 255, 41], [0, 255, 173], [10, 0, 255],
-            [173, 255, 0], [0, 255, 153], [255, 92, 0], [255, 0, 255],
-            [255, 0, 245], [255, 0, 102], [255, 173, 0], [255, 0, 20],
-            [255, 184, 184], [0, 31, 255], [0, 255, 61], [0, 71, 255],
-            [255, 0, 204], [0, 255, 194], [0, 255, 82], [0, 10, 255],
-            [0, 112, 255], [51, 0, 255], [0, 194, 255], [0, 122, 255],
-            [0, 255, 163], [255, 153, 0], [0, 255, 10], [255, 112, 0],
-            [143, 255, 0], [82, 0, 255], [163, 255, 0], [255, 235, 0],
-            [8, 184, 170], [133, 0, 255], [0, 255, 92], [184, 0, 255],
-            [255, 0, 31], [0, 184, 255], [0, 214, 255], [255, 0, 112],
-            [92, 255, 0], [0, 224, 255], [112, 224, 255], [70, 184, 160],
-            [163, 0, 255], [153, 0, 255], [71, 255, 0], [255, 0, 163],
-            [255, 204, 0], [255, 0, 143], [0, 255, 235], [133, 255, 0],
-            [255, 0, 235], [245, 0, 255], [255, 0, 122], [255, 245, 0],
-            [10, 190, 212], [214, 255, 0], [0, 204, 255], [20, 0, 255],
-            [255, 255, 0], [0, 153, 255], [0, 41, 255], [0, 255, 204],
-            [41, 0, 255], [41, 255, 0], [173, 0, 255], [0, 245, 255],
-            [71, 0, 255], [122, 0, 255], [0, 255, 184], [0, 92, 255],
-            [184, 255, 0], [0, 133, 255], [255, 214, 0], [25, 194, 194],
-            [102, 255, 0], [92, 0, 255]]

images/bag_scribble_out.png CHANGED Viewed

images/bird.png DELETED Viewed

Git LFS Details

SHA256: cad49fc7d3071b2bcd078bc8dde365f8fa62eaa6d43705fd50c212794a3aac35
Pointer size: 132 Bytes
Size of remote file: 1.07 MB

images/bird_canny.png DELETED Viewed

Binary file (29.1 kB)

images/bird_canny_out.png DELETED Viewed

Binary file (845 kB)

images/chef_pose_out.png DELETED Viewed

Binary file (570 kB)

images/house.png DELETED Viewed

Binary file (391 kB)

images/house_seg.png DELETED Viewed

Binary file (3.68 kB)

images/house_seg_out.png DELETED Viewed

Binary file (472 kB)

images/man.png DELETED Viewed

Binary file (773 kB)

images/man_hed.png DELETED Viewed

Binary file (118 kB)

images/man_hed_out.png DELETED Viewed

Binary file (737 kB)

images/openpose.png DELETED Viewed

Binary file (6.55 kB)

images/pose.png DELETED Viewed

Binary file (592 kB)

images/room.png DELETED Viewed

Binary file (637 kB)

images/room_mlsd.png DELETED Viewed

Binary file (9.06 kB)

images/room_mlsd_out.png DELETED Viewed

Binary file (575 kB)

images/stormtrooper.png DELETED Viewed

Binary file (218 kB)

images/stormtrooper_depth.png DELETED Viewed

Binary file (54.1 kB)

images/stormtrooper_depth_out.png DELETED Viewed

Binary file (343 kB)

images/toy.png DELETED Viewed

Binary file (312 kB)

images/toy_normal.png DELETED Viewed

Binary file (90.1 kB)

images/toy_normal_out.png DELETED Viewed

Binary file (231 kB)