Skip to main content

Ctrl+K

Site Navigation

Setting up your computer
Prompting basics
Accessing LLMs
Chatbots
Function / Tool calling

Image generation

Image manipulation

Generating Videos, Books, Slides and online content

Synthesizing data

Code generation

Vision language models

Auto-generating PowerPoint files with chatGPT and Dall-E

Retrieval Augmented Generation

Solving github issues

Model Fine-Tuning in the cloud

Model Fine-Tuning locally

Benchmarking Vision Language Models

Site Navigation

Setting up your computer
Prompting basics
Accessing LLMs
Chatbots
Function / Tool calling

Image generation

Image manipulation

Generating Videos, Books, Slides and online content

Synthesizing data

Code generation

Vision language models

Auto-generating PowerPoint files with chatGPT and Dall-E

Retrieval Augmented Generation

Solving github issues

Model Fine-Tuning in the cloud

Model Fine-Tuning locally

Benchmarking Vision Language Models

Ctrl+K

Generative Artificial Intelligence Notebooks

Setup

Setting up your computer
- Installation instructions for Scientific Computing Uni Leipzig (paula)

LLM basics

Prompting basics
Accessing LLMs
Chatbots
- Programming an LLM-based chatbot
- A Chatbot GUI
Function / Tool calling
- Function calling using ollama
- Function calling using ScaDS.AI’s LLM service

Multi-Modal LLMs

Image generation
Image manipulation
Generating Videos, Books, Slides and online content
- Video generation
Synthesizing data
- Generating synthetic customer data
- Combining LLMs with Random number generators for data generation
Code generation
Vision language models

Advanced Prompt Engineering

Auto-generating PowerPoint files with chatGPT and Dall-E
Retrieval Augmented Generation
Chat with Docs
Solving github issues
Agents
Model Fine-Tuning in the cloud
Model Fine-Tuning locally
Benchmarking
Benchmarking Vision Language Models

Links

Imprint

repository
open issue

.ipynb

Vision models with structured output

Contents

Exercise

Vision models with structured output#

In this notebook we will ask a vision model to produce structured output, so that we can put the image in categories or use the information to write Python code for analysing the image.

import openai
from skimage.io import imread
import stackview
from image_utilities import numpy_to_bytestream
import base64
from stackview._image_widget import _img_to_rgb
from IPython.display import Markdown

hela = imread("data/hela-cells-8bit.tif")
stackview.insight(hela)

shape	(512, 672, 3)
dtype	uint8
size	1008.0 kB
min	0
max	255

def prompt_chatGPT(prompt:str, image, model="gpt-4o"):
    """A prompt helper function that sends a message to openAI
    and returns only the text response.
    """
    rgb_image = _img_to_rgb(image)
    byte_stream = numpy_to_bytestream(rgb_image)
    base64_image = base64.b64encode(byte_stream).decode('utf-8')

    message = [{"role": "user", "content": [
        {"type": "text", "text": prompt},
        {
        "type": "image_url",
        "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
        }
    }]}]
            
    # setup connection to the LLM
    client = openai.OpenAI()
    
    # submit prompt
    response = client.chat.completions.create(
        model=model,
        messages=message
    )
    
    # extract answer
    return response.choices[0].message.content

result = prompt_chatGPT("""You are a highly experienced biologist with advanced microscopy skills.

# Task
Name the content of this image. Answer for each channel independently. 

# Options
The following structures could be in the image:
* Nulcei
* Membranes
* Cytoplasm
* Cytoskeleton
* Extra-cellular structure
* Other sub-cellular structures

# Output format
* Red channel: <structure>
* Green channel: <structure>
* Blue channel: <structure>

Keep your answer as short as possible. 
Only respond with the structres for the three channels in the format shown above.
""", hela)

Markdown(result)

Red channel: Other sub-cellular structures
Green channel: Cytoskeleton
Blue channel: Nuclei

Exercise#

Ask the vision model to generate Python code for segmenting the image. Try using llava for the same task.

previous

Moondream LLM

next

VLMs guessing image segmentation strategies

On this page

Exercise

By Robert Haase

Last updated on 2025-08-07.

Copyright: Licensed CC-BY 4.0 unless mentioned otherwise. Contributions and feedback are welcome.