Vision models with structured output#
In this notebook we will ask a vision model to produce structured output, so that we can put the image in categories or use the information to write Python code for analysing the image.
import openai
from skimage.io import imread
import stackview
from image_utilities import numpy_to_bytestream
import base64
from stackview._image_widget import _img_to_rgb
from IPython.display import Markdown
hela = imread("data/hela-cells-8bit.tif")
stackview.insight(hela)
|
def prompt_chatGPT(prompt:str, image, model="gpt-4o"):
"""A prompt helper function that sends a message to openAI
and returns only the text response.
"""
rgb_image = _img_to_rgb(image)
byte_stream = numpy_to_bytestream(rgb_image)
base64_image = base64.b64encode(byte_stream).decode('utf-8')
message = [{"role": "user", "content": [
{"type": "text", "text": prompt},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}]}]
# setup connection to the LLM
client = openai.OpenAI()
# submit prompt
response = client.chat.completions.create(
model=model,
messages=message
)
# extract answer
return response.choices[0].message.content
result = prompt_chatGPT("""You are a highly experienced biologist with advanced microscopy skills.
# Task
Name the content of this image. Answer for each channel independently.
# Options
The following structures could be in the image:
* Nulcei
* Membranes
* Cytoplasm
* Cytoskeleton
* Extra-cellular structure
* Other sub-cellular structures
# Output format
* Red channel: <structure>
* Green channel: <structure>
* Blue channel: <structure>
Keep your answer as short as possible.
Only respond with the structres for the three channels in the format shown above.
""", hela)
Markdown(result)
Red channel: Other sub-cellular structures
Green channel: Cytoskeleton
Blue channel: Nuclei
Exercise#
Ask the vision model to generate Python code for segmenting the image. Try using llava for the same task.