{
"cells": [
{
"cell_type": "markdown",
"id": "d26ca72c-d147-495c-bc01-1e598b6bb729",
"metadata": {},
"source": [
"# Qwen2-VL\n",
"\n",
"In this notebook we will use the vision language model [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) to inspect an image.\n",
"\n",
"We will use [ScaDS.AI LLM infrastructure](https://llm.scads.ai/) infrastructure at the [Center for Information Services and High Performance Computing (ZIH) of TU Dresden](https://tu-dresden.de/zih). To use it, you must be connected via [TU Dresden VPN](https://tu-dresden.de/zih/dienste/service-katalog/arbeitsumgebung/zugang_datennetz/vpn)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "af96a154-c13c-4368-824c-44f0ba76d04d",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import openai\n",
"from skimage.io import imread\n",
"import stackview\n",
"from image_utilities import numpy_to_bytestream\n",
"import base64\n",
"from stackview._image_widget import _img_to_rgb\n",
"import os\n",
"from IPython.display import display, Markdown"
]
},
{
"cell_type": "markdown",
"id": "6642d25b-efbe-4d34-b60b-5099cd8b0055",
"metadata": {},
"source": [
"## Example image\n",
"First we load a microscopy image. With such an example we can test if the model was trained on scientific microscopy data."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "498ccb1b-e43c-4063-8962-3941141dda58",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
" \n",
" | \n",
"\n",
"\n",
"\n",
"shape | (512, 672, 3) | \n",
"dtype | uint8 | \n",
"size | 1008.0 kB | \n",
"min | 0 | max | 255 | \n",
" \n",
" \n",
" | \n",
"
\n",
"
"
],
"text/plain": [
"StackViewNDArray([[[ 3, 6, 1],\n",
" [ 3, 7, 0],\n",
" [ 3, 6, 1],\n",
" ...,\n",
" [11, 8, 2],\n",
" [11, 7, 2],\n",
" [11, 11, 2]],\n",
"\n",
" [[ 3, 6, 1],\n",
" [ 3, 8, 1],\n",
" [ 3, 7, 1],\n",
" ...,\n",
" [11, 10, 2],\n",
" [10, 10, 2],\n",
" [11, 11, 2]],\n",
"\n",
" [[ 4, 6, 1],\n",
" [ 3, 6, 1],\n",
" [ 4, 6, 1],\n",
" ...,\n",
" [10, 10, 2],\n",
" [11, 10, 2],\n",
" [11, 10, 2]],\n",
"\n",
" ...,\n",
"\n",
" [[15, 14, 8],\n",
" [14, 14, 8],\n",
" [15, 14, 7],\n",
" ...,\n",
" [10, 11, 5],\n",
" [10, 12, 4],\n",
" [11, 14, 5]],\n",
"\n",
" [[14, 16, 7],\n",
" [16, 15, 7],\n",
" [15, 16, 8],\n",
" ...,\n",
" [10, 11, 4],\n",
" [11, 13, 4],\n",
" [11, 16, 5]],\n",
"\n",
" [[15, 18, 7],\n",
" [14, 17, 8],\n",
" [14, 17, 8],\n",
" ...,\n",
" [ 9, 12, 5],\n",
" [10, 13, 5],\n",
" [11, 15, 5]]], dtype=uint8)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hela_cells = imread(\"data/hela-cells-8bit.tif\")\n",
"stackview.insight(hela_cells)"
]
},
{
"cell_type": "markdown",
"id": "920cc018-fdd1-49c7-bda3-fdd269c0030b",
"metadata": {},
"source": [
"We will now send the image to the LLM server and ask it the some questions."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "18422ecf-34f7-4ec3-a6b9-aa65098abab4",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def prompt_qwen(prompt:str, image, model=\"Qwen/Qwen2-VL-7B-Instruct\"):\n",
" \"\"\"A prompt helper function that sends a message to the server\n",
" and returns only the text response.\n",
" \"\"\"\n",
" rgb_image = _img_to_rgb(image)\n",
" byte_stream = numpy_to_bytestream(rgb_image)\n",
" base64_image = base64.b64encode(byte_stream).decode('utf-8')\n",
"\n",
" message = [{\"role\": \"user\", \"content\": [\n",
" {\"type\": \"text\", \"text\": prompt},\n",
" {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\n",
" \"url\": f\"data:image/png;base64,{base64_image}\"\n",
" }\n",
" }]}]\n",
" \n",
" # setup connection to the LLM\n",
" client = openai.OpenAI(base_url=\"https://llm.scads.ai/v1\",\n",
" api_key=os.environ.get('SCADSAI_API_KEY'))\n",
" \n",
" # submit prompt\n",
" response = client.chat.completions.create(\n",
" model=model,\n",
" messages=message\n",
" )\n",
" \n",
" # extract answer\n",
" return response.choices[0].message.content"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c3e7b402-b6d9-4768-be7c-29cecf7ec6c3",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/markdown": [
"The image appears to be a microscopic view of biological cells, possibly analyzed using fluorescence microscopy techniques. The cells are stained to highlight different structures:\n",
"\n",
"1. **Blue Areas**: These represent the nuclei of the cells, typically stained with a dye like DAPI or Hoechst that binds to DNA.\n",
"2. **Green Structures**: These are likely the plasma membranes of the cells, as visualized by a fluorescence stain like FM1-43, which binds to lipid-bilayer membranes给人。\n",
"3. **Red Dots**: These are clusters or puncta that might represent specific signaling molecules or proteins, possibly related to cytokines or receptors participating in cell signaling pathways"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"res = prompt_qwen(\"what's in this image?\", hela_cells)\n",
"\n",
"display(Markdown(res))"
]
},
{
"cell_type": "markdown",
"id": "90928c47-0ef1-44df-a2a0-636145942940",
"metadata": {},
"source": [
"## Exercise\n",
"Ask the model to specifically describe what is in a selected colour channel. Repeat this exercise with a natural image such as \"real_cat.png\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3a5850f3-f85b-4ccb-80cb-43d7a71d1bf5",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}