Local VLMs using Ollma#
In this notebook we will use vision language models provided via ollama, to inspect a natural image. Before runnning this locally, consider downloading the model using this terminal command:
ollama pull gemma3:4b
import openai
from skimage.io import imread
import stackview
from image_utilities import numpy_to_bytestream, prompt_ollama
Example images#
First we load a natural image
The LLava model is capable of describing images via the ollama API.
image = imread("data/real_cat.png")
stackview.insight(image)
|
|
|
prompt_ollama("what's in this image?", image)
"Here’s what’s in the image:\n\n* **A cat:** A black and white cat is sitting and looking at the microscope. \n* **A microscope:** A white laboratory microscope is positioned next to the cat.\n* **A red cushion:** There's a red cushion visible in the background. \n\nIt's a cute and curious picture!"
Exercise#
Load the MRI dataset and ask about the image. E.g. ask what image this is, what modality, or if the image shows a male or a female.