Local VLMs using Ollma

Contents

Local VLMs using Ollma#

In this notebook we will use vision language models provided via ollama, to inspect a natural image. Before runnning this locally, consider downloading the model using this terminal command:

ollama pull gemma3:4b

import openai
from skimage.io import imread
import stackview
from image_utilities import numpy_to_bytestream, prompt_ollama

Example images#

First we load a natural image

The LLava model is capable of describing images via the ollama API.

image = imread("data/real_cat.png")
stackview.insight(image)

shape	(512, 512, 3)
dtype	uint8
size	768.0 kB
min	0
max	255

prompt_ollama("what's in this image?", image)

"Here’s what’s in the image:\n\n*   **A cat:** A black and white cat is sitting and looking at the microscope. \n*   **A microscope:**  A white laboratory microscope is positioned next to the cat.\n*   **A red cushion:** There's a red cushion visible in the background. \n\nIt's a cute and curious picture!"

Exercise#

Load the MRI dataset and ask about the image. E.g. ask what image this is, what modality, or if the image shows a male or a female.