{
"cells": [
{
"cell_type": "markdown",
"id": "d26ca72c-d147-495c-bc01-1e598b6bb729",
"metadata": {},
"source": [
"# LLAVA\n",
"\n",
"In this notebook we will use [LLAVA](https://llava-vl.github.io/), a vision language model, to inspect a natural image."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "af96a154-c13c-4368-824c-44f0ba76d04d",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import openai\n",
"from skimage.io import imread\n",
"import stackview\n",
"from image_utilities import numpy_to_bytestream\n",
"import base64\n",
"from stackview._image_widget import _img_to_rgb"
]
},
{
"cell_type": "markdown",
"id": "6642d25b-efbe-4d34-b60b-5099cd8b0055",
"metadata": {},
"source": [
"## Example images\n",
"First we load a natural image"
]
},
{
"cell_type": "markdown",
"id": "596dbff0-5713-4cc0-ba1c-54568daf67b5",
"metadata": {
"tags": []
},
"source": [
"The LLava model is capable of describing images via the [ollama](https://ollama.com/) API."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9497fcb1-47b3-40c2-a54d-a865f8fcc0ca",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def prompt_ollama(prompt:str, image, model=\"llava\"):\n",
" \"\"\"A prompt helper function that sends a message to ollama\n",
" and returns only the text response.\n",
" \"\"\"\n",
" rgb_image = _img_to_rgb(image)\n",
" byte_stream = numpy_to_bytestream(rgb_image)\n",
" base64_image = base64.b64encode(byte_stream).decode('utf-8')\n",
"\n",
" message = [{\n",
" 'role': 'user',\n",
" 'content': prompt,\n",
" 'images': [base64_image]\n",
" }]\n",
" \n",
" # setup connection to the LLM\n",
" client = openai.OpenAI(\n",
" base_url = \"http://localhost:11434/v1\"\n",
" )\n",
" \n",
" # submit prompt\n",
" response = client.chat.completions.create(\n",
" model=model,\n",
" messages=message\n",
" )\n",
" \n",
" # extract answer\n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "498ccb1b-e43c-4063-8962-3941141dda58",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
" \n",
" | \n",
"\n",
"\n",
"\n",
"shape | (512, 512, 3) | \n",
"dtype | uint8 | \n",
"size | 768.0 kB | \n",
"min | 0 | max | 255 | \n",
" \n",
" \n",
" | \n",
"
\n",
"
"
],
"text/plain": [
"StackViewNDArray([[[176, 178, 179],\n",
" [175, 178, 178],\n",
" [177, 177, 180],\n",
" ...,\n",
" [182, 186, 188],\n",
" [185, 188, 191],\n",
" [191, 194, 197]],\n",
"\n",
" [[178, 180, 181],\n",
" [178, 179, 181],\n",
" [178, 180, 181],\n",
" ...,\n",
" [185, 189, 192],\n",
" [187, 191, 192],\n",
" [191, 195, 198]],\n",
"\n",
" [[181, 183, 185],\n",
" [180, 182, 183],\n",
" [180, 181, 183],\n",
" ...,\n",
" [190, 193, 196],\n",
" [189, 193, 196],\n",
" [192, 195, 198]],\n",
"\n",
" ...,\n",
"\n",
" [[125, 91, 66],\n",
" [124, 90, 65],\n",
" [123, 89, 65],\n",
" ...,\n",
" [137, 92, 64],\n",
" [136, 91, 62],\n",
" [135, 89, 61]],\n",
"\n",
" [[122, 88, 64],\n",
" [121, 87, 63],\n",
" [121, 87, 63],\n",
" ...,\n",
" [142, 96, 68],\n",
" [142, 96, 68],\n",
" [139, 94, 65]],\n",
"\n",
" [[120, 86, 62],\n",
" [120, 86, 60],\n",
" [119, 85, 61],\n",
" ...,\n",
" [144, 99, 70],\n",
" [144, 99, 70],\n",
" [142, 97, 68]]], dtype=uint8)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"image = imread(\"data/real_cat.png\")\n",
"stackview.insight(image)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4616b40f-5d33-4ee4-b7b1-4d8bf6dba143",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"' The image is not provided. Please provide an image, and I can give you information about it. '"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"prompt_ollama(\"what's in this image?\", image, model=\"llava\")"
]
},
{
"cell_type": "markdown",
"id": "ad3f9ef8-fb59-4a1a-8fd8-31e42803d1ab",
"metadata": {
"tags": []
},
"source": [
"## Exercise\n",
"Load the MRI dataset and ask LLava about the image."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4843733b-aa67-4b27-8d97-8682d0364140",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}