{ "cells": [ { "cell_type": "markdown", "id": "d26ca72c-d147-495c-bc01-1e598b6bb729", "metadata": {}, "source": [ "# Claude VLM for bounding-box segmentation\n", "\n", "In this notebook we will use the vision language model [claude](https://claude.ai) to determine bounding-boxes around objects." ] }, { "cell_type": "code", "execution_count": 1, "id": "af96a154-c13c-4368-824c-44f0ba76d04d", "metadata": { "tags": [] }, "outputs": [], "source": [ "import anthropic\n", "from skimage.io import imread\n", "import stackview\n", "from image_utilities import numpy_to_bytestream, extract_json\n", "from prompt_utilities import prompt_anthropic\n", "import base64\n", "import json\n", "import os\n", "import pandas as pd\n", "from skimage.io import imsave\n" ] }, { "cell_type": "markdown", "id": "b8f09605-b666-4a93-9c20-e30e70d0f254", "metadata": {}, "source": [ "## Bounding box segmentation\n", "Models such as claude have some capabilities and can detect objects and tell us about their positions and size." ] }, { "cell_type": "code", "execution_count": 2, "id": "fd9b681a-ed05-40ca-b5ba-2360920dbd1f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
shape(100, 100)
dtypeuint8
size9.8 kB
min7
max88
\n", "\n", "
" ], "text/plain": [ "StackViewNDArray([[ 8, 8, 8, ..., 10, 9, 9],\n", " [ 8, 8, 7, ..., 10, 11, 10],\n", " [ 9, 8, 8, ..., 9, 10, 9],\n", " ...,\n", " [ 9, 8, 9, ..., 9, 9, 8],\n", " [ 9, 8, 8, ..., 9, 9, 9],\n", " [ 8, 8, 9, ..., 10, 9, 9]], dtype=uint8)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import stackview\n", "from skimage import data\n", "import numpy as np\n", "\n", "# Load the human mitosis dataset\n", "image = data.human_mitosis()[:100, :100]\n", "\n", "stackview.insight(image)" ] }, { "cell_type": "code", "execution_count": null, "id": "8fae03e3-b744-44ca-8a65-5f988478f0cb", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 3, "id": "3a7e780e-0276-4922-a9f5-24f1bb0bccdc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I'll analyze this image and provide bounding boxes for the bright blobs. The image shows several white/bright circular blobs against a black background.\n", "\n", "Here's my attempt to identify the bounding boxes around all bright blobs in the image:\n", "\n", "```json\n", "[\n", " {\"x\": 0.191, \"y\": 0.111, \"width\": 0.068, \"height\": 0.068},\n", " {\"x\": 0.313, \"y\": 0.161, \"width\": 0.068, \"height\": 0.068},\n", " {\"x\": 0.387, \"y\": 0.284, \"width\": 0.068, \"height\": 0.068},\n", " {\"x\": 0.254, \"y\": 0.377, \"width\": 0.068, \"height\": 0.068},\n", " {\"x\": 0.508, \"y\": 0.198, \"width\": 0.068, \"height\": 0.068},\n", " {\"x\": 0.640, \"y\": 0.173, \"width\": 0.068, \"height\": 0.068},\n", " {\"x\": 0.735, \"y\": 0.272, \"width\": 0.068, \"height\": 0.068},\n", " {\"x\": 0.191, \"y\": 0.481, \"width\": 0.068, \"height\": 0.068},\n", " {\"x\": 0.367, \"y\": 0.494, \"width\": 0.068, \"height\": 0.068},\n", " {\"x\": 0.528, \"y\": 0.377, \"width\": 0.068, \"height\": 0.068},\n", " {\"x\": 0.640, \"y\": 0.432, \"width\": 0.068, \"height\": 0.068},\n", " {\"x\": 0.132, \"y\": 0.272, \"width\": 0.068, \"height\": 0.068}\n", "]\n", "```\n", "\n", "Note that this is an approximation based on visual inspection of the image. The actual position and size of the blobs might vary slightly from these measurements, and I've made the assumption that the blobs are relatively uniform in size.\n" ] } ], "source": [ "reply = prompt_anthropic(\"\"\"\n", "Give me a json object of bounding boxes around ALL bright blobs in this image. Assume the image width and height are 1. \n", "The format should be like this: \n", "\n", "```json\n", "[\n", " {'x':float,'y':float, 'width': float, 'height': float},\n", " {'x':float,'y':float, 'width': float, 'height': float},\n", " ...\n", "]\n", "```\n", "\n", "If you think you can't do this accuratly, please try anyway.\n", "\"\"\", image)\n", "print(reply)\n", "bb = json.loads(extract_json(reply))\n", "bb\n", "\n", "new_image = stackview.add_bounding_boxes(image, bb)" ] }, { "cell_type": "code", "execution_count": 4, "id": "3af6c407-6f40-4411-8aae-119b658eed25", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
shape(100, 100, 3)
dtypeuint8
size29.3 kB
min0
max255
\n", "\n", "
" ], "text/plain": [ "StackViewNDArray([[[ 3, 3, 3],\n", " [ 3, 3, 3],\n", " [ 3, 3, 3],\n", " ...,\n", " [ 9, 9, 9],\n", " [ 6, 6, 6],\n", " [ 6, 6, 6]],\n", "\n", " [[ 3, 3, 3],\n", " [ 3, 3, 3],\n", " [ 0, 0, 0],\n", " ...,\n", " [ 9, 9, 9],\n", " [12, 12, 12],\n", " [ 9, 9, 9]],\n", "\n", " [[ 6, 6, 6],\n", " [ 3, 3, 3],\n", " [ 3, 3, 3],\n", " ...,\n", " [ 6, 6, 6],\n", " [ 9, 9, 9],\n", " [ 6, 6, 6]],\n", "\n", " ...,\n", "\n", " [[ 6, 6, 6],\n", " [ 3, 3, 3],\n", " [ 6, 6, 6],\n", " ...,\n", " [ 6, 6, 6],\n", " [ 6, 6, 6],\n", " [ 3, 3, 3]],\n", "\n", " [[ 6, 6, 6],\n", " [ 3, 3, 3],\n", " [ 3, 3, 3],\n", " ...,\n", " [ 6, 6, 6],\n", " [ 6, 6, 6],\n", " [ 6, 6, 6]],\n", "\n", " [[ 3, 3, 3],\n", " [ 3, 3, 3],\n", " [ 6, 6, 6],\n", " ...,\n", " [ 9, 9, 9],\n", " [ 6, 6, 6],\n", " [ 6, 6, 6]]], dtype=uint8)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_image" ] }, { "cell_type": "code", "execution_count": null, "id": "f665c32f-cabd-4da7-87a4-d287967108ab", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 5 }