{
"cells": [
{
"cell_type": "markdown",
"id": "5f8f6908-a675-4786-9c5f-85be9a15ba6f",
"metadata": {},
"source": [
"# Vision models with structured output\n",
"In this notebook we will ask a vision model to produce structured output, so that we can put the image in categories or use the information to write Python code for analysing the image."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "da807ce6-a52d-4336-b692-6c31a7358328",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import openai\n",
"from skimage.io import imread\n",
"import stackview\n",
"from image_utilities import numpy_to_bytestream\n",
"import base64\n",
"from stackview._image_widget import _img_to_rgb\n",
"from IPython.display import Markdown"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "1f8c6c03-5bcc-4e18-b9eb-f16510a29f7e",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
" \n",
" | \n",
"\n",
"\n",
"\n",
"shape | (512, 672, 3) | \n",
"dtype | uint8 | \n",
"size | 1008.0 kB | \n",
"min | 0 | max | 255 | \n",
" \n",
" \n",
" | \n",
"
\n",
"
"
],
"text/plain": [
"StackViewNDArray([[[ 3, 6, 1],\n",
" [ 3, 7, 0],\n",
" [ 3, 6, 1],\n",
" ...,\n",
" [11, 8, 2],\n",
" [11, 7, 2],\n",
" [11, 11, 2]],\n",
"\n",
" [[ 3, 6, 1],\n",
" [ 3, 8, 1],\n",
" [ 3, 7, 1],\n",
" ...,\n",
" [11, 10, 2],\n",
" [10, 10, 2],\n",
" [11, 11, 2]],\n",
"\n",
" [[ 4, 6, 1],\n",
" [ 3, 6, 1],\n",
" [ 4, 6, 1],\n",
" ...,\n",
" [10, 10, 2],\n",
" [11, 10, 2],\n",
" [11, 10, 2]],\n",
"\n",
" ...,\n",
"\n",
" [[15, 14, 8],\n",
" [14, 14, 8],\n",
" [15, 14, 7],\n",
" ...,\n",
" [10, 11, 5],\n",
" [10, 12, 4],\n",
" [11, 14, 5]],\n",
"\n",
" [[14, 16, 7],\n",
" [16, 15, 7],\n",
" [15, 16, 8],\n",
" ...,\n",
" [10, 11, 4],\n",
" [11, 13, 4],\n",
" [11, 16, 5]],\n",
"\n",
" [[15, 18, 7],\n",
" [14, 17, 8],\n",
" [14, 17, 8],\n",
" ...,\n",
" [ 9, 12, 5],\n",
" [10, 13, 5],\n",
" [11, 15, 5]]], dtype=uint8)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hela = imread(\"data/hela-cells-8bit.tif\")\n",
"stackview.insight(hela)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "9fae4d0a-0251-45f1-aa5c-2b79fa12976d",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def prompt_chatGPT(prompt:str, image, model=\"gpt-4o\"):\n",
" \"\"\"A prompt helper function that sends a message to openAI\n",
" and returns only the text response.\n",
" \"\"\"\n",
" rgb_image = _img_to_rgb(image)\n",
" byte_stream = numpy_to_bytestream(rgb_image)\n",
" base64_image = base64.b64encode(byte_stream).decode('utf-8')\n",
"\n",
" message = [{\"role\": \"user\", \"content\": [\n",
" {\"type\": \"text\", \"text\": prompt},\n",
" {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\n",
" \"url\": f\"data:image/jpeg;base64,{base64_image}\"\n",
" }\n",
" }]}]\n",
" \n",
" # setup connection to the LLM\n",
" client = openai.OpenAI()\n",
" \n",
" # submit prompt\n",
" response = client.chat.completions.create(\n",
" model=model,\n",
" messages=message\n",
" )\n",
" \n",
" # extract answer\n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c3f5a0a6-cce1-47a1-9f16-a640a3112220",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/markdown": [
"* Red channel: Other sub-cellular structures\n",
"* Green channel: Cytoskeleton\n",
"* Blue channel: Nuclei"
],
"text/plain": [
""
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result = prompt_chatGPT(\"\"\"You are a highly experienced biologist with advanced microscopy skills.\n",
"\n",
"# Task\n",
"Name the content of this image. Answer for each channel independently. \n",
"\n",
"# Options\n",
"The following structures could be in the image:\n",
"* Nulcei\n",
"* Membranes\n",
"* Cytoplasm\n",
"* Cytoskeleton\n",
"* Extra-cellular structure\n",
"* Other sub-cellular structures\n",
"\n",
"# Output format\n",
"* Red channel: \n",
"* Green channel: \n",
"* Blue channel: \n",
"\n",
"Keep your answer as short as possible. \n",
"Only respond with the structres for the three channels in the format shown above.\n",
"\"\"\", hela)\n",
"\n",
"Markdown(result)"
]
},
{
"cell_type": "markdown",
"id": "fb608b96-ff97-4d3a-a476-dc5855c808a1",
"metadata": {},
"source": [
"## Exercise\n",
"Ask the vision model to generate Python code for segmenting the image. Try using llava for the same task."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3a21b846-15ee-40aa-8842-ecced3896e08",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}