Image variations using Stable Diffusion#

Models such as Stable Diffusion can take images, vary them in a latent space and then return a new image that appears a variation of the original. This can be useful for producing multiple similar example images and studying if algorithms, e.g. for segmentation, are capable to process these image variations.

The example shown here is adapted from this source

import requests
import torch
import PIL
from io import BytesIO
from skimage.io import imread
import numpy as np
import stackview
import matplotlib.pyplot as plt
from diffusers import StableDiffusionImg2ImgPipeline

We load a pipeline on the GPU first.

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
            "runwayml/stable-diffusion-v1-5", 
            torch_dtype=torch.float16)
pipe = pipe.to("cuda")

Here we load our numpy-array-like image and convert it to be a pillow image, which is the required input type.

image_np = imread("data/blobs.tif")
image_rgb_np = np.asarray([image_np, image_np, image_np]).swapaxes(0, 2).swapaxes(0, 1)
init_image = PIL.Image.fromarray(image_rgb_np)
init_image = init_image.resize((512, 512))
init_image
../_images/f737dba7fb8135712e6c26b297b11058657d90d9224e5e0b29519d082971fffe.png

We can now vary this image using a prompt.

image = pipe(
              prompt="brighter blobs", 
              image=init_image, 
              strength=0.5, 
              guidance_scale=7.5, 
            ).images[0]
image
C:\Users\haase\miniconda3\envs\genai2\Lib\site-packages\diffusers\models\attention_processor.py:1584: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  hidden_states = F.scaled_dot_product_attention(
../_images/030d37477457d8a39975325331873494b4708483f51aa81eebd4e8dae6ec5a60.png

The strength parameter allows us to tune how similar the new image should be to the original.

strengths = [0, 0.5, 0.75, 1]

fig, axs = plt.subplots(1, 5, figsize=(15, 15))
axs[0].imshow(image_rgb_np)
axs[0].set_title(f"original")

for i, strength in enumerate(strengths):
    image = pipe(
              prompt="brighter blobs", 
              image=init_image, 
              strength=strength, 
              guidance_scale=7.5, 
            ).images[0]
    
    np_image = np.array(image)
    axs[i+1].imshow(np_image)
    axs[i+1].set_title(f"strength={strength}")
../_images/422c65613ea5c53af9dc874a7b67cd92d542aec2269c0d1b47daa22b3e18662e.png

Obviously, the model has not been trained [only] on bio-medical imaging data.

scales = [0, 7.5, 15, 30]

fig, axs = plt.subplots(1, 5, figsize=(15, 15))
axs[0].imshow(image_rgb_np)
axs[0].set_title(f"original")

for i, scale in enumerate(scales):
    image = pipe(
              prompt="brighter blobs", 
              image=init_image, 
              strength=0.75, 
              guidance_scale=scale, 
            ).images[0]
    
    np_image = np.array(image)
    axs[i+1].imshow(np_image)
    axs[i+1].set_title(f"guidance_scale={scale}")
../_images/dc8aa3ae7d6313ce2e1ea94e268f5777e81843dacb12ab9131ccf18c34f8b88f.png

With careful parameter tuning and prompting one can also achieve science-art.

image = pipe(
              prompt="cats instead of bright blobs", 
              image=init_image, 
              strength=0.5, 
              guidance_scale=7.5, 
            ).images[0]

fig, axs = plt.subplots(1, 2, figsize=(15, 15))
axs[0].imshow(image_rgb_np)
axs[1].imshow(np.array(image))
<matplotlib.image.AxesImage at 0x1c0428dfd50>
../_images/c1ec739b3b297cbb6ccbd9c511292552a1c413b17bfa1b7498b15e0bb9e27584.png

Exercise#

Vary the blobs image in a way that the edges become smoother compared to the original.