{ "cells": [ { "cell_type": "markdown", "id": "5f8f6908-a675-4786-9c5f-85be9a15ba6f", "metadata": {}, "source": [ "# Vision models with structured output\n", "In this notebook we will ask a vision model to produce structured output, so that we can put the image in categories or use the information to write Python code for analysing the image." ] }, { "cell_type": "code", "execution_count": 1, "id": "da807ce6-a52d-4336-b692-6c31a7358328", "metadata": { "tags": [] }, "outputs": [], "source": [ "import openai\n", "from skimage.io import imread\n", "import stackview\n", "from image_utilities import numpy_to_bytestream\n", "import base64\n", "from stackview._image_widget import _img_to_rgb\n", "from IPython.display import Markdown" ] }, { "cell_type": "code", "execution_count": 2, "id": "1f8c6c03-5bcc-4e18-b9eb-f16510a29f7e", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
shape(512, 672, 3)
dtypeuint8
size1008.0 kB
min0
max255
\n", "\n", "
" ], "text/plain": [ "StackViewNDArray([[[ 3, 6, 1],\n", " [ 3, 7, 0],\n", " [ 3, 6, 1],\n", " ...,\n", " [11, 8, 2],\n", " [11, 7, 2],\n", " [11, 11, 2]],\n", "\n", " [[ 3, 6, 1],\n", " [ 3, 8, 1],\n", " [ 3, 7, 1],\n", " ...,\n", " [11, 10, 2],\n", " [10, 10, 2],\n", " [11, 11, 2]],\n", "\n", " [[ 4, 6, 1],\n", " [ 3, 6, 1],\n", " [ 4, 6, 1],\n", " ...,\n", " [10, 10, 2],\n", " [11, 10, 2],\n", " [11, 10, 2]],\n", "\n", " ...,\n", "\n", " [[15, 14, 8],\n", " [14, 14, 8],\n", " [15, 14, 7],\n", " ...,\n", " [10, 11, 5],\n", " [10, 12, 4],\n", " [11, 14, 5]],\n", "\n", " [[14, 16, 7],\n", " [16, 15, 7],\n", " [15, 16, 8],\n", " ...,\n", " [10, 11, 4],\n", " [11, 13, 4],\n", " [11, 16, 5]],\n", "\n", " [[15, 18, 7],\n", " [14, 17, 8],\n", " [14, 17, 8],\n", " ...,\n", " [ 9, 12, 5],\n", " [10, 13, 5],\n", " [11, 15, 5]]], dtype=uint8)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hela = imread(\"data/hela-cells-8bit.tif\")\n", "stackview.insight(hela)" ] }, { "cell_type": "code", "execution_count": 3, "id": "9fae4d0a-0251-45f1-aa5c-2b79fa12976d", "metadata": { "tags": [] }, "outputs": [], "source": [ "def prompt_chatGPT(prompt:str, image, model=\"gpt-4o\"):\n", " \"\"\"A prompt helper function that sends a message to openAI\n", " and returns only the text response.\n", " \"\"\"\n", " rgb_image = _img_to_rgb(image)\n", " byte_stream = numpy_to_bytestream(rgb_image)\n", " base64_image = base64.b64encode(byte_stream).decode('utf-8')\n", "\n", " message = [{\"role\": \"user\", \"content\": [\n", " {\"type\": \"text\", \"text\": prompt},\n", " {\n", " \"type\": \"image_url\",\n", " \"image_url\": {\n", " \"url\": f\"data:image/jpeg;base64,{base64_image}\"\n", " }\n", " }]}]\n", " \n", " # setup connection to the LLM\n", " client = openai.OpenAI()\n", " \n", " # submit prompt\n", " response = client.chat.completions.create(\n", " model=model,\n", " messages=message\n", " )\n", " \n", " # extract answer\n", " return response.choices[0].message.content" ] }, { "cell_type": "code", "execution_count": 4, "id": "c3f5a0a6-cce1-47a1-9f16-a640a3112220", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/markdown": [ "* Red channel: Other sub-cellular structures\n", "* Green channel: Cytoskeleton\n", "* Blue channel: Nuclei" ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = prompt_chatGPT(\"\"\"You are a highly experienced biologist with advanced microscopy skills.\n", "\n", "# Task\n", "Name the content of this image. Answer for each channel independently. \n", "\n", "# Options\n", "The following structures could be in the image:\n", "* Nulcei\n", "* Membranes\n", "* Cytoplasm\n", "* Cytoskeleton\n", "* Extra-cellular structure\n", "* Other sub-cellular structures\n", "\n", "# Output format\n", "* Red channel: \n", "* Green channel: \n", "* Blue channel: \n", "\n", "Keep your answer as short as possible. \n", "Only respond with the structres for the three channels in the format shown above.\n", "\"\"\", hela)\n", "\n", "Markdown(result)" ] }, { "cell_type": "markdown", "id": "fb608b96-ff97-4d3a-a476-dc5855c808a1", "metadata": {}, "source": [ "## Exercise\n", "Ask the vision model to generate Python code for segmenting the image. Try using llava for the same task." ] }, { "cell_type": "code", "execution_count": null, "id": "3a21b846-15ee-40aa-8842-ecced3896e08", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 5 }