Bounding box segmentation#

Some models have some capabilities in bounding-box segmentation of objects. Goal of this task is to draw a minimum sized surrounding rectangle around objects in an image.

import anthropic
from skimage.io import imread
import stackview
from image_utilities import numpy_to_bytestream, extract_json, prompt_kisski
import base64
import json
cat_image = imread("data/real_cat.png")
reply = prompt_kisski("""
Give me a json object of a bounding boxes around the cat in this image. 
The format should be exactly like this: {'x':int,'y':int,'width':int,'height':int}
""", cat_image)
print(reply)
bb = json.loads(extract_json(reply))
bb

stackview.add_bounding_boxes(cat_image, [bb])
```json
{
  "x": 6,
  "y": 10,
  "width": 427,
  "height": 408
}
```
shape(512, 512, 3)
dtypeuint8
size768.0 kB
min0
max255

Bounding box segmentation using vision language models are an active research field. To see how well this works, we can inspect a couple of images.

visualizations = []

for filename in ["data/pinata.jpg", "data/real_cat.png", "data/sheeps.jpg", "data/guinea_pig.jpg"]:
    image = imread(filename)
    
    reply = prompt_kisski("""Give me a json object of a list of bounding boxes around each animal in this 512x512 pixel large image. 
The format should be like this:

```json
[
    {
        "x":int,
        "y":int, 
        "width":int, 
        "height":int,
        "description":str,
        "font_size":25
    }
]
```
""", image)

    bb = json.loads(extract_json(reply))

    vis = stackview.add_bounding_boxes(image, bb)
    visualizations.append(vis)
stackview.animate(visualizations, frame_delay_ms=2000)

Exercise#

Load blobs.tif and ask claude to draw bounding boxes around the white blobs.