LLAVA#

In this notebook we will use LLAVA, a vision language model, to inspect a natural image.

import openai
from skimage.io import imread
import stackview
from image_utilities import numpy_to_bytestream
import base64
from stackview._image_widget import _img_to_rgb

Example images#

First we load a natural image

The LLava model is capable of describing images via the ollama API.

def prompt_ollama(prompt:str, image, model="llava"):
    """A prompt helper function that sends a message to ollama
    and returns only the text response.
    """
    rgb_image = _img_to_rgb(image)
    byte_stream = numpy_to_bytestream(rgb_image)
    base64_image = base64.b64encode(byte_stream).decode('utf-8')

    message = [{
        'role': 'user',
        'content': prompt,
        'images': [base64_image]
    }]
        
    # setup connection to the LLM
    client = openai.OpenAI(
        base_url = "http://localhost:11434/v1"
    )
    
    # submit prompt
    response = client.chat.completions.create(
        model=model,
        messages=message
    )
    
    # extract answer
    return response.choices[0].message.content
image = imread("data/real_cat.png")
stackview.insight(image)
shape(512, 512, 3)
dtypeuint8
size768.0 kB
min0
max255
prompt_ollama("what's in this image?", image, model="llava")
' The image is not provided. Please provide an image, and I can give you information about it. '

Exercise#

Load the MRI dataset and ask LLava about the image.