Claude VLM for bounding-box segmentation#

In this notebook we will use the vision language model claude to determine bounding-boxes around objects.

import anthropic
from skimage.io import imread
import stackview
from image_utilities import numpy_to_bytestream, extract_json
from prompt_utilities import prompt_anthropic
import base64
import json
import os
import pandas as pd
from skimage.io import imsave

Bounding box segmentation#

Models such as claude have some capabilities and can detect objects and tell us about their positions and size.

import stackview
from skimage import data
import numpy as np

# Load the human mitosis dataset
image = data.human_mitosis()[:100, :100]

stackview.insight(image)
shape(100, 100)
dtypeuint8
size9.8 kB
min7
max88
reply = prompt_anthropic("""
Give me a json object of bounding boxes around ALL bright blobs in this image. Assume the image width and height are 1. 
The format should be like this: 

```json
[
    {'x':float,'y':float, 'width': float, 'height': float},
    {'x':float,'y':float, 'width': float, 'height': float},
    ...
]
```

If you think you can't do this accuratly, please try anyway.
""", image, model="claude-opus-4-1-20250805")
print(reply)
bb = json.loads(extract_json(reply))
bb

new_image = stackview.add_bounding_boxes(image, bb)
Looking at this image, I can identify approximately 13-14 bright blobs/circles. I'll provide bounding boxes for each visible blob, with coordinates normalized to a 1x1 image space where (0,0) is the top-left corner.

```json
[
    {"x": 0.05, "y": 0.02, "width": 0.08, "height": 0.08},
    {"x": 0.20, "y": 0.05, "width": 0.08, "height": 0.08},
    {"x": 0.38, "y": 0.03, "width": 0.08, "height": 0.08},
    {"x": 0.08, "y": 0.18, "width": 0.08, "height": 0.08},
    {"x": 0.28, "y": 0.20, "width": 0.10, "height": 0.10},
    {"x": 0.48, "y": 0.22, "width": 0.08, "height": 0.08},
    {"x": 0.15, "y": 0.38, "width": 0.09, "height": 0.09},
    {"x": 0.35, "y": 0.40, "width": 0.08, "height": 0.08},
    {"x": 0.55, "y": 0.42, "width": 0.08, "height": 0.08},
    {"x": 0.25, "y": 0.58, "width": 0.10, "height": 0.10},
    {"x": 0.45, "y": 0.60, "width": 0.08, "height": 0.08},
    {"x": 0.12, "y": 0.75, "width": 0.08, "height": 0.08},
    {"x": 0.35, "y": 0.78, "width": 0.09, "height": 0.09},
    {"x": 0.50, "y": 0.85, "width": 0.08, "height": 0.08}
]
```

Note: These are approximate bounding boxes based on visual estimation of the blob positions and sizes. The actual pixel-perfect coordinates might vary slightly.
new_image
shape(100, 100, 3)
dtypeuint8
size29.3 kB
min0
max255