{ "cells": [ { "cell_type": "markdown", "id": "d26ca72c-d147-495c-bc01-1e598b6bb729", "metadata": {}, "source": [ "# Qwen2-VL\n", "\n", "In this notebook we will use the vision language model [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) to inspect an image.\n", "\n", "We will use [ScaDS.AI LLM infrastructure](https://llm.scads.ai/) infrastructure at the [Center for Information Services and High Performance Computing (ZIH) of TU Dresden](https://tu-dresden.de/zih). To use it, you must be connected via [TU Dresden VPN](https://tu-dresden.de/zih/dienste/service-katalog/arbeitsumgebung/zugang_datennetz/vpn)." ] }, { "cell_type": "code", "execution_count": 1, "id": "af96a154-c13c-4368-824c-44f0ba76d04d", "metadata": { "tags": [] }, "outputs": [], "source": [ "import openai\n", "from skimage.io import imread\n", "import stackview\n", "from image_utilities import numpy_to_bytestream\n", "import base64\n", "from stackview._image_widget import _img_to_rgb\n", "import os\n", "from IPython.display import display, Markdown" ] }, { "cell_type": "markdown", "id": "6642d25b-efbe-4d34-b60b-5099cd8b0055", "metadata": {}, "source": [ "## Example image\n", "First we load a microscopy image. With such an example we can test if the model was trained on scientific microscopy data." ] }, { "cell_type": "code", "execution_count": 2, "id": "498ccb1b-e43c-4063-8962-3941141dda58", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
shape(512, 672, 3)
dtypeuint8
size1008.0 kB
min0
max255
\n", "\n", "
" ], "text/plain": [ "StackViewNDArray([[[ 3, 6, 1],\n", " [ 3, 7, 0],\n", " [ 3, 6, 1],\n", " ...,\n", " [11, 8, 2],\n", " [11, 7, 2],\n", " [11, 11, 2]],\n", "\n", " [[ 3, 6, 1],\n", " [ 3, 8, 1],\n", " [ 3, 7, 1],\n", " ...,\n", " [11, 10, 2],\n", " [10, 10, 2],\n", " [11, 11, 2]],\n", "\n", " [[ 4, 6, 1],\n", " [ 3, 6, 1],\n", " [ 4, 6, 1],\n", " ...,\n", " [10, 10, 2],\n", " [11, 10, 2],\n", " [11, 10, 2]],\n", "\n", " ...,\n", "\n", " [[15, 14, 8],\n", " [14, 14, 8],\n", " [15, 14, 7],\n", " ...,\n", " [10, 11, 5],\n", " [10, 12, 4],\n", " [11, 14, 5]],\n", "\n", " [[14, 16, 7],\n", " [16, 15, 7],\n", " [15, 16, 8],\n", " ...,\n", " [10, 11, 4],\n", " [11, 13, 4],\n", " [11, 16, 5]],\n", "\n", " [[15, 18, 7],\n", " [14, 17, 8],\n", " [14, 17, 8],\n", " ...,\n", " [ 9, 12, 5],\n", " [10, 13, 5],\n", " [11, 15, 5]]], dtype=uint8)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hela_cells = imread(\"data/hela-cells-8bit.tif\")\n", "stackview.insight(hela_cells)" ] }, { "cell_type": "markdown", "id": "920cc018-fdd1-49c7-bda3-fdd269c0030b", "metadata": {}, "source": [ "We will now send the image to the LLM server and ask it the some questions." ] }, { "cell_type": "code", "execution_count": 3, "id": "18422ecf-34f7-4ec3-a6b9-aa65098abab4", "metadata": { "tags": [] }, "outputs": [], "source": [ "def prompt_qwen(prompt:str, image, model=\"Qwen/Qwen2-VL-7B-Instruct\"):\n", " \"\"\"A prompt helper function that sends a message to the server\n", " and returns only the text response.\n", " \"\"\"\n", " rgb_image = _img_to_rgb(image)\n", " byte_stream = numpy_to_bytestream(rgb_image)\n", " base64_image = base64.b64encode(byte_stream).decode('utf-8')\n", "\n", " message = [{\"role\": \"user\", \"content\": [\n", " {\"type\": \"text\", \"text\": prompt},\n", " {\n", " \"type\": \"image_url\",\n", " \"image_url\": {\n", " \"url\": f\"data:image/png;base64,{base64_image}\"\n", " }\n", " }]}]\n", " \n", " # setup connection to the LLM\n", " client = openai.OpenAI(base_url=\"https://llm.scads.ai/v1\",\n", " api_key=os.environ.get('SCADSAI_API_KEY'))\n", " \n", " # submit prompt\n", " response = client.chat.completions.create(\n", " model=model,\n", " messages=message\n", " )\n", " \n", " # extract answer\n", " return response.choices[0].message.content" ] }, { "cell_type": "code", "execution_count": 4, "id": "c3e7b402-b6d9-4768-be7c-29cecf7ec6c3", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/markdown": [ "The image appears to be a microscopic view of biological cells, possibly analyzed using fluorescence microscopy techniques. The cells are stained to highlight different structures:\n", "\n", "1. **Blue Areas**: These represent the nuclei of the cells, typically stained with a dye like DAPI or Hoechst that binds to DNA.\n", "2. **Green Structures**: These are likely the plasma membranes of the cells, as visualized by a fluorescence stain like FM1-43, which binds to lipid-bilayer membranes给人。\n", "3. **Red Dots**: These are clusters or puncta that might represent specific signaling molecules or proteins, possibly related to cytokines or receptors participating in cell signaling pathways" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "res = prompt_qwen(\"what's in this image?\", hela_cells)\n", "\n", "display(Markdown(res))" ] }, { "cell_type": "markdown", "id": "90928c47-0ef1-44df-a2a0-636145942940", "metadata": {}, "source": [ "## Exercise\n", "Ask the model to specifically describe what is in a selected colour channel. Repeat this exercise with a natural image such as \"real_cat.png\"." ] }, { "cell_type": "code", "execution_count": null, "id": "3a5850f3-f85b-4ccb-80cb-43d7a71d1bf5", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 5 }