Skip to main content

Ctrl+K

Site Navigation

Setting up your computer
Prompting basics
Accessing LLMs
Chatbots
Function / Tool calling

Image generation

Image manipulation

Generating Videos, Books, Slides and online content

Synthesizing data

Code generation

Vision language models

Auto-generating PowerPoint files with chatGPT and Dall-E

Retrieval Augmented Generation

Solving github issues

Model Fine-Tuning in the cloud

Model Fine-Tuning locally

Benchmarking Vision Language Models

Site Navigation

Setting up your computer
Prompting basics
Accessing LLMs
Chatbots
Function / Tool calling

Image generation

Image manipulation

Generating Videos, Books, Slides and online content

Synthesizing data

Code generation

Vision language models

Auto-generating PowerPoint files with chatGPT and Dall-E

Retrieval Augmented Generation

Solving github issues

Model Fine-Tuning in the cloud

Model Fine-Tuning locally

Benchmarking Vision Language Models

Ctrl+K

Generative Artificial Intelligence Notebooks

Setup

Setting up your computer
- Installation instructions for Scientific Computing Uni Leipzig (paula)

LLM basics

Prompting basics
Accessing LLMs
Chatbots
- Programming an LLM-based chatbot
- A Chatbot GUI
Function / Tool calling
- Function calling using ollama
- Function calling using ScaDS.AI’s LLM service

Multi-Modal LLMs

Image generation
Image manipulation
Generating Videos, Books, Slides and online content
- Video generation
Synthesizing data
- Generating synthetic customer data
- Combining LLMs with Random number generators for data generation
Code generation
Vision language models

Advanced Prompt Engineering

Auto-generating PowerPoint files with chatGPT and Dall-E
Retrieval Augmented Generation
Chat with Docs
Solving github issues
Agents
Model Fine-Tuning in the cloud
Model Fine-Tuning locally
Benchmarking
Benchmarking Vision Language Models

Links

Imprint

repository
open issue

.ipynb

VLMs on Kiara

Contents

Example images
Exercise

VLMs on Kiara#

In this notebook we will use vision language models on yet experimental LLM infrastructure infrastructure. To use it, you must enter two enviroment variables KIARA_API_KEY and KIARA_LLM_SERVER.

from skimage.io import imread
import stackview
from image_utilities import numpy_to_bytestream
import base64
from stackview._image_widget import _img_to_rgb

Example images#

First we load a natural image

The LLava model is capable of describing images via the ollama API.

def prompt_kiara(prompt:str, image, model="vllm-llama-4-scout-17b-16e-instruct"):
    """A prompt helper function that sends a message to the llm service provider
    and returns only the text response.
    """
    import os
    import openai
    
    rgb_image = _img_to_rgb(image)
    byte_stream = numpy_to_bytestream(rgb_image)
    base64_image = base64.b64encode(byte_stream).decode('utf-8')

    message = [{"role": "user", "content": [
        {"type": "text", "text": prompt},
        {
        "type": "image_url",
        "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
        }
    }]}]
        
    # setup connection to the LLM
    client = openai.OpenAI(
        base_url=os.environ.get('KIARA_LLM_SERVER') + "api/",
        api_key=os.environ.get('KIARA_API_KEY')
    )
    
    # submit prompt
    response = client.chat.completions.create(
        model=model,
        messages=message
    )
    
    # extract answer
    return response.choices[0].message.content

image = imread("data/real_cat.png")
stackview.insight(image)

shape	(512, 512, 3)
dtype	uint8
size	768.0 kB
min	0
max	255

prompt_kiara("what's in this image?", image)

'The image shows a black and white cat standing on a table next to a white device that appears to be a digital microscope. The cat is facing to the right, with its tail extending behind it. The device has a white body with black knobs and a small camera on top. It sits on a light-colored wooden table, with a red fabric object partially visible to the right. The background features a white wall with a textured surface.'

Exercise#

Load the MRI dataset and ask LLava about the image.

previous

Claude vision

next

Moondream LLM

On this page

Example images
Exercise

By Robert Haase

Last updated on 2025-10-11.

Copyright: Licensed CC-BY 4.0 unless mentioned otherwise. Contributions and feedback are welcome.