{ "cells": [ { "cell_type": "markdown", "id": "d26ca72c-d147-495c-bc01-1e598b6bb729", "metadata": {}, "source": [ "# LLAVA\n", "\n", "In this notebook we will use [LLAVA](https://llava-vl.github.io/), a vision language model, to inspect a natural image." ] }, { "cell_type": "code", "execution_count": 1, "id": "af96a154-c13c-4368-824c-44f0ba76d04d", "metadata": { "tags": [] }, "outputs": [], "source": [ "import openai\n", "from skimage.io import imread\n", "import stackview\n", "from image_utilities import numpy_to_bytestream\n", "import base64\n", "from stackview._image_widget import _img_to_rgb" ] }, { "cell_type": "markdown", "id": "6642d25b-efbe-4d34-b60b-5099cd8b0055", "metadata": {}, "source": [ "## Example images\n", "First we load a natural image" ] }, { "cell_type": "markdown", "id": "596dbff0-5713-4cc0-ba1c-54568daf67b5", "metadata": { "tags": [] }, "source": [ "The LLava model is capable of describing images via the [ollama](https://ollama.com/) API." ] }, { "cell_type": "code", "execution_count": 2, "id": "9497fcb1-47b3-40c2-a54d-a865f8fcc0ca", "metadata": { "tags": [] }, "outputs": [], "source": [ "def prompt_ollama(prompt:str, image, model=\"llava\"):\n", " \"\"\"A prompt helper function that sends a message to ollama\n", " and returns only the text response.\n", " \"\"\"\n", " rgb_image = _img_to_rgb(image)\n", " byte_stream = numpy_to_bytestream(rgb_image)\n", " base64_image = base64.b64encode(byte_stream).decode('utf-8')\n", "\n", " message = [{\n", " 'role': 'user',\n", " 'content': prompt,\n", " 'images': [base64_image]\n", " }]\n", " \n", " # setup connection to the LLM\n", " client = openai.OpenAI(\n", " base_url = \"http://localhost:11434/v1\"\n", " )\n", " \n", " # submit prompt\n", " response = client.chat.completions.create(\n", " model=model,\n", " messages=message\n", " )\n", " \n", " # extract answer\n", " return response.choices[0].message.content" ] }, { "cell_type": "code", "execution_count": 3, "id": "498ccb1b-e43c-4063-8962-3941141dda58", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
shape(512, 512, 3)
dtypeuint8
size768.0 kB
min0
max255
\n", "\n", "
" ], "text/plain": [ "StackViewNDArray([[[176, 178, 179],\n", " [175, 178, 178],\n", " [177, 177, 180],\n", " ...,\n", " [182, 186, 188],\n", " [185, 188, 191],\n", " [191, 194, 197]],\n", "\n", " [[178, 180, 181],\n", " [178, 179, 181],\n", " [178, 180, 181],\n", " ...,\n", " [185, 189, 192],\n", " [187, 191, 192],\n", " [191, 195, 198]],\n", "\n", " [[181, 183, 185],\n", " [180, 182, 183],\n", " [180, 181, 183],\n", " ...,\n", " [190, 193, 196],\n", " [189, 193, 196],\n", " [192, 195, 198]],\n", "\n", " ...,\n", "\n", " [[125, 91, 66],\n", " [124, 90, 65],\n", " [123, 89, 65],\n", " ...,\n", " [137, 92, 64],\n", " [136, 91, 62],\n", " [135, 89, 61]],\n", "\n", " [[122, 88, 64],\n", " [121, 87, 63],\n", " [121, 87, 63],\n", " ...,\n", " [142, 96, 68],\n", " [142, 96, 68],\n", " [139, 94, 65]],\n", "\n", " [[120, 86, 62],\n", " [120, 86, 60],\n", " [119, 85, 61],\n", " ...,\n", " [144, 99, 70],\n", " [144, 99, 70],\n", " [142, 97, 68]]], dtype=uint8)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "image = imread(\"data/real_cat.png\")\n", "stackview.insight(image)" ] }, { "cell_type": "code", "execution_count": 4, "id": "4616b40f-5d33-4ee4-b7b1-4d8bf6dba143", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "' The image is not provided. Please provide an image, and I can give you information about it. '" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prompt_ollama(\"what's in this image?\", image, model=\"llava\")" ] }, { "cell_type": "markdown", "id": "ad3f9ef8-fb59-4a1a-8fd8-31e42803d1ab", "metadata": { "tags": [] }, "source": [ "## Exercise\n", "Load the MRI dataset and ask LLava about the image." ] }, { "cell_type": "code", "execution_count": null, "id": "4843733b-aa67-4b27-8d97-8682d0364140", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 5 }