{ "cells": [ { "cell_type": "markdown", "id": "d26ca72c-d147-495c-bc01-1e598b6bb729", "metadata": {}, "source": [ "# Qwen 2.5 VL for bounding-box segmentation\n", "\n", "In this notebook we will use the vision language model [Qwen 2.5 Vision Language Model](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) to test if it supports drawing bounding boxes around objects." ] }, { "cell_type": "code", "execution_count": 1, "id": "af96a154-c13c-4368-824c-44f0ba76d04d", "metadata": { "tags": [] }, "outputs": [], "source": [ "import openai\n", "from skimage.io import imread\n", "import stackview\n", "from image_utilities import extract_json\n", "from prompt_utilities import prompt_openai\n", "\n", "import json\n", "import os\n", "import pandas as pd\n", "from skimage.io import imsave\n" ] }, { "cell_type": "markdown", "id": "b8f09605-b666-4a93-9c20-e30e70d0f254", "metadata": {}, "source": [ "## Bounding box segmentation\n", "We first load an example dataset, a crop of the human_mitosis image from scikit-image." ] }, { "cell_type": "code", "execution_count": 2, "id": "fd9b681a-ed05-40ca-b5ba-2360920dbd1f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
shape(100, 100)
dtypeuint8
size9.8 kB
min7
max88
\n", "\n", "
" ], "text/plain": [ "[[ 8 8 8 ... 10 9 9]\n", " [ 8 8 7 ... 10 11 10]\n", " [ 9 8 8 ... 9 10 9]\n", " ...\n", " [ 9 8 9 ... 9 9 8]\n", " [ 9 8 8 ... 9 9 9]\n", " [ 8 8 9 ... 10 9 9]]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import stackview\n", "from skimage import data\n", "import numpy as np\n", "\n", "# Load the human mitosis dataset\n", "image = data.human_mitosis()[:100, :100]\n", "\n", "stackview.insight(image)" ] }, { "cell_type": "code", "execution_count": 3, "id": "5a5daec0-ac40-4056-825a-f27d9baf3690", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Based on the provided image and the specified format, I'll attempt to estimate the bounding boxes around all the bright blobs. Please note that the positions and dimensions are approximations based on visual inspection:\n", "\n", "```json\n", "[\n", " {\"x\":0.15,\"y\":0.75, \"width\": 0.1, \"height\": 0.1},\n", " {\"x\":0.35,\"y\":0.65, \"width\": 0.15, \"height\": 0.15},\n", " {\"x\":0.55,\"y\":0.75, \"width\": 0.15, \"height\": 0.1},\n", " {\"x\":0.75,\"y\":0.65, \"width\": 0.15, \"height\": 0.15},\n", " {\"x\":0.85,\"y\":0.85, \"width\": 0.1, \"height\": 0.1},\n", " {\"x\":0.25,\"y\":0.45, \"width\": 0.1, \"height\": 0.1},\n", " {\"x\":0.45,\"y\":0.4, \"width\": 0.1, \"height\": 0.1},\n", " {\"x\":0.65,\"y\":0.4, \"width\": 0.1, \"height\": 0.1},\n", " {\"x\":0.8, \"y\":0.35, \"width\": 0.1, \"height\": 0.1},\n", " {\"x\":0.3, \"y\":0.25, \"width\": 0.1, \"height\": 0.1},\n", " {\"x\":0.45,\"y\":0.15, \"width\": 0.15, \"height\": 0.15},\n", " {\"x\":0.65,\"y\":0.2, \"width\": 0.1, \"height\": 0.1},\n", " {\"x\":0.85,\"y\":0.15, \"width\": 0.1, \"height\": 0.1}\n", "]\n", "```\n", "\n", "This is an approximation, and the actual values may vary depending on the precise dimensions and positions of the blobs in the image.\n" ] } ], "source": [ "model = \"qwen2.5-vl-72b-instruct\"\n", "\n", "reply = prompt_openai(\"\"\"\n", "Give me a json object of bounding boxes around ALL bright blobs in this image. Assume the image width and height are 1. \n", "The bottom left is position (0,0), top left is (0,1), top right is (1,1) and bottom right is (1,0).\n", "The format should be like this: \n", "\n", "```json\n", "[\n", " {\"x\":float,\"y\":float, \"width\": float, \"height\": float},\n", " {\"x\":float,\"y\":float, \"width\": float, \"height\": float},\n", " ...\n", "]\n", "```\n", "\n", "If you think you can't do this accuratly, please try anyway.\n", "\"\"\", image, model=model, base_url=\"https://chat-ai.academiccloud.de/v1\", api_key=os.environ.get('KISSKI_API_KEY'))\n", "print(reply)" ] }, { "cell_type": "code", "execution_count": 4, "id": "1f268ccd-a0b5-49ad-8bee-03d7fa8b2d60", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'x': 0.15, 'y': 0.75, 'width': 0.1, 'height': 0.1},\n", " {'x': 0.35, 'y': 0.65, 'width': 0.15, 'height': 0.15},\n", " {'x': 0.55, 'y': 0.75, 'width': 0.15, 'height': 0.1},\n", " {'x': 0.75, 'y': 0.65, 'width': 0.15, 'height': 0.15},\n", " {'x': 0.85, 'y': 0.85, 'width': 0.1, 'height': 0.1},\n", " {'x': 0.25, 'y': 0.45, 'width': 0.1, 'height': 0.1},\n", " {'x': 0.45, 'y': 0.4, 'width': 0.1, 'height': 0.1},\n", " {'x': 0.65, 'y': 0.4, 'width': 0.1, 'height': 0.1},\n", " {'x': 0.8, 'y': 0.35, 'width': 0.1, 'height': 0.1},\n", " {'x': 0.3, 'y': 0.25, 'width': 0.1, 'height': 0.1},\n", " {'x': 0.45, 'y': 0.15, 'width': 0.15, 'height': 0.15},\n", " {'x': 0.65, 'y': 0.2, 'width': 0.1, 'height': 0.1},\n", " {'x': 0.85, 'y': 0.15, 'width': 0.1, 'height': 0.1}]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bb = json.loads(extract_json(reply))\n", "\n", "bb" ] }, { "cell_type": "markdown", "id": "5fe287fe-d583-418f-a9df-ed2d80a7eef2", "metadata": {}, "source": [ "This correction step seems necessary because the model doesn't understand the coordinate system as we do." ] }, { "cell_type": "code", "execution_count": 5, "id": "ed2a3624-5cce-486a-94b5-32591d630366", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'x': 0.75, 'y': 0.85, 'width': 0.1, 'height': 0.1, 't': 0.15},\n", " {'x': 0.65, 'y': 0.65, 'width': 0.15, 'height': 0.15, 't': 0.35},\n", " {'x': 0.75,\n", " 'y': 0.44999999999999996,\n", " 'width': 0.15,\n", " 'height': 0.1,\n", " 't': 0.55},\n", " {'x': 0.65, 'y': 0.25, 'width': 0.15, 'height': 0.15, 't': 0.75},\n", " {'x': 0.85, 'y': 0.15000000000000002, 'width': 0.1, 'height': 0.1, 't': 0.85},\n", " {'x': 0.45, 'y': 0.75, 'width': 0.1, 'height': 0.1, 't': 0.25},\n", " {'x': 0.4, 'y': 0.55, 'width': 0.1, 'height': 0.1, 't': 0.45},\n", " {'x': 0.4, 'y': 0.35, 'width': 0.1, 'height': 0.1, 't': 0.65},\n", " {'x': 0.35, 'y': 0.19999999999999996, 'width': 0.1, 'height': 0.1, 't': 0.8},\n", " {'x': 0.25, 'y': 0.7, 'width': 0.1, 'height': 0.1, 't': 0.3},\n", " {'x': 0.15, 'y': 0.55, 'width': 0.15, 'height': 0.15, 't': 0.45},\n", " {'x': 0.2, 'y': 0.35, 'width': 0.1, 'height': 0.1, 't': 0.65},\n", " {'x': 0.15, 'y': 0.15000000000000002, 'width': 0.1, 'height': 0.1, 't': 0.85}]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "for b in bb:\n", " b['t'] = b['x']\n", " b['x'] = b['y']\n", " b['y'] = 1 - b['t']\n", "bb" ] }, { "cell_type": "code", "execution_count": 6, "id": "ba7cf79c-c493-4a7b-b8f8-93fe7c7ffc27", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
shape(100, 100, 3)
dtypeuint8
size29.3 kB
min0
max255
\n", "\n", "
" ], "text/plain": [ "[[[ 3 3 3]\n", " [ 3 3 3]\n", " [ 3 3 3]\n", " ...\n", " [ 9 9 9]\n", " [ 6 6 6]\n", " [ 6 6 6]]\n", "\n", " [[ 3 3 3]\n", " [ 3 3 3]\n", " [ 0 0 0]\n", " ...\n", " [ 9 9 9]\n", " [12 12 12]\n", " [ 9 9 9]]\n", "\n", " [[ 6 6 6]\n", " [ 3 3 3]\n", " [ 3 3 3]\n", " ...\n", " [ 6 6 6]\n", " [ 9 9 9]\n", " [ 6 6 6]]\n", "\n", " ...\n", "\n", " [[ 6 6 6]\n", " [ 3 3 3]\n", " [ 6 6 6]\n", " ...\n", " [ 6 6 6]\n", " [ 6 6 6]\n", " [ 3 3 3]]\n", "\n", " [[ 6 6 6]\n", " [ 3 3 3]\n", " [ 3 3 3]\n", " ...\n", " [ 6 6 6]\n", " [ 6 6 6]\n", " [ 6 6 6]]\n", "\n", " [[ 3 3 3]\n", " [ 3 3 3]\n", " [ 6 6 6]\n", " ...\n", " [ 9 9 9]\n", " [ 6 6 6]\n", " [ 6 6 6]]]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_image = stackview.add_bounding_boxes(image, bb)\n", "\n", "new_image" ] }, { "cell_type": "code", "execution_count": null, "id": "69158df7-1a80-4b08-8128-42c13af12610", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 5 }