Vision models for image interpretation and code generation#

Some models support image input and can interpret the images. This might be useful to guide the large language model when deciding what to do with the image.

import stackview
from skimage.io import imread
from bia_bob import bob
bob.initialize(model="claude-3-opus-20240229", vision_model="claude-3-opus-20240229")
#bob.initialize(model="gpt-4o-2024-05-13", vision_model="gpt-4o-2024-05-13")
#bob.initialize(model="gemini-1.5-pro-latest", vision_model="gemini-1.5-pro-latest")
This notebook may contain text, code and images generated by artificial intelligence. Used model: claude-3-opus-20240229, vision model: claude-3-opus-20240229, endpoint: None, bia-bob version: 0.20.0. Read more about code generation using bia-bob.

First, we load an example image.

image = imread("hela-cells-8bit.tif")

stackview.insight(image)
shape(512, 672, 3)
dtypeuint8
size1008.0 kB
min0
max255

We can use vision capabilities by passing the image like this:

%%bob image
what's in the blue channel of this microscopy image? Answer in one short sentence.

The blue channel shows cell nuclei in this microscopy image.

%%bob 
Please segment the nuclei and use stackview.animate_curtain 
to show the resulting label image on top of the original image.
from skimage.filters import threshold_otsu
from skimage.morphology import remove_small_objects
from skimage.measure import label

# Extract the blue channel
blue = image[:,:,2]

# Apply a threshold to create a binary mask
thresh = threshold_otsu(blue)
mask = blue > thresh

# Remove small objects
mask = remove_small_objects(mask, min_size=50)

# Label the connected components
labels = label(mask)

# Display the label image on top of the original image
stackview.animate_curtain(image, labels)
C:\Users\haase\miniconda3\envs\genai2\Lib\site-packages\stackview\_animate.py:61: UserWarning: The image is quite large (> 10 MByte) and might not be properly shown in the notebook when rendered over the internet. Consider subsampling or cropping the image for visualization purposes.
  warnings.warn("The image is quite large (> 10 MByte) and might not be properly shown in the notebook when rendered over the internet. Consider subsampling or cropping the image for visualization purposes.")