GPT4-omni VLM#
In this notebook we will use the vision language model GPT4 Omni to inspect an image.
import openai
from skimage.io import imread
import stackview
from image_utilities import numpy_to_bytestream, extract_json
import base64
from stackview._image_widget import _img_to_rgb
import json
Example image#
First we load a medical tomography image.
mri = imread("data/Haase_MRT_tfl3d1.tif")[100]
stackview.insight(mri)
|
|
We will now send the image to ChatGPT and ask it the same questions.
def prompt_chatGPT(prompt:str, image, model="gpt-4o"):
"""A prompt helper function that sends a message to openAI
and returns only the text response.
"""
rgb_image = _img_to_rgb(image)
byte_stream = numpy_to_bytestream(rgb_image)
base64_image = base64.b64encode(byte_stream).decode('utf-8')
message = [{"role": "user", "content": [
{"type": "text", "text": prompt},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}]}]
# setup connection to the LLM
client = openai.OpenAI()
# submit prompt
response = client.chat.completions.create(
model=model,
messages=message
)
# extract answer
return response.choices[0].message.content
prompt_chatGPT("what's in this image?", mri, model="gpt-4o")
'This image shows a sagittal MRI scan of a human head. You can see the brain, spinal cord, and other anatomical features such as the nasal passages and the structure of the skull.'
cat_image = imread("data/real_cat.png")
reply = prompt_chatGPT("""
Give me a json object of a bounding boxes around the cat in this image.
The format should be like this: {'x':int,'y':int,'width':int,'height':int}
""", cat_image)
print(reply)
bb = json.loads(extract_json(reply))
stackview.add_bounding_boxes(cat_image, [bb])
```json
{
"x": 150,
"y": 50,
"width": 140,
"height": 200
}
```
|
|
Exercise#
Use a vision-language model to determine the content of an image, e.g. membrane2d.tif
. Ask the model to differentiate these cases:
An image with bright blob-like structures
An image with membrane-like structures such as lines or meshes
Make sure the model response with the case only and no detailed explanation.