GPT4-omni VLM#

In this notebook we will use the vision language model GPT4 Omni to inspect an image.

import openai
from skimage.io import imread
import stackview
from image_utilities import numpy_to_bytestream
import base64
from stackview._image_widget import _img_to_rgb

Example image#

First we load a medical tomography image.

mri = imread("data/Haase_MRT_tfl3d1.tif")[100]
stackview.insight(mri)
shape(256, 256)
dtypeuint8
size64.0 kB
min0
max255

We will now send the image to ChatGPT and ask it the same questions.

def prompt_chatGPT(prompt:str, image, model="gpt-4o"):
    """A prompt helper function that sends a message to openAI
    and returns only the text response.
    """
    rgb_image = _img_to_rgb(image)
    byte_stream = numpy_to_bytestream(rgb_image)
    base64_image = base64.b64encode(byte_stream).decode('utf-8')

    message = [{"role": "user", "content": [
        {"type": "text", "text": prompt},
        {
        "type": "image_url",
        "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
        }
    }]}]
            
    # setup connection to the LLM
    client = openai.OpenAI()
    
    # submit prompt
    response = client.chat.completions.create(
        model=model,
        messages=message
    )
    
    # extract answer
    return response.choices[0].message.content
prompt_chatGPT("what's in this image?", mri, model="gpt-4o")
'This image shows an MRI scan of a human head, focusing on a sagittal view of the brain. You can see the side profile of the brain, including the cerebrum, cerebellum, and brainstem, along with parts of the nasal and oral cavities.'

Exercise#

Use a vision-language model to determine the content of an image, e.g. membrane2d.tif. Ask the model to differentiate these cases:

  • An image with bright blob-like structures

  • An image with membrane-like structures such as lines or meshes

Make sure the model response with the case only and no detailed explanation.