Open Weight VLMs for Counting objects

Open Weight VLMs for Counting objects#

In this notebook we use open weight LLMs with vision capabilities to see how well they can count blobs in blobs.tif .

Note: It is not recommended to use this approach for counting objects in microscopy images. The author of this notebook is not aware of any publication showing that this approach works well.

import openai
import PIL
import stackview
from skimage.io import imread
from skimage.measure import label
from skimage.segmentation import clear_border
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from tqdm import tqdm
from transformers import pipeline

We will need some helper functions for assembling a prompt and submitting it to the llm server.

def prompt_with_image(prompt:str, image, model="gemma3:4b", ollama_url="http://localhost:11434"):
    """A prompt helper function that sends a text message and an image
    to openAI and returns the text response.
    """
    import os
    
    if image is not None:
        message = image_to_message(image, prompt)
    else:
        message = [{"role": "user", "content": prompt}]

            
    # setup connection to the LLM
    client = openai.OpenAI(base_url=f"{ollama_url}/v1",)
    
    # submit prompt
    response = client.chat.completions.create(
        model=model,
        messages=message
    )
    
    # extract answer
    answer = response.choices[0].message.content
    if "</think>" in answer:
        answer = answer.split("</think>")[1].strip()
        
    return answer


def image_to_message(image, prompt):
    import base64

    from stackview._image_widget import _img_to_rgb

    rgb_image = _img_to_rgb(image)
    byte_stream = numpy_to_bytestream(rgb_image)
    base64_image = base64.b64encode(byte_stream).decode('utf-8')

    return [{"role": "user", "content": [
        {"type": "text", "text": prompt},
        {"type": "image_url",
        "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
        }

    }]}]


def numpy_to_bytestream(data):
    """Turn a NumPy array into a bytestream"""
    import numpy as np
    from PIL import Image
    import io

    # Convert the NumPy array to a PIL Image
    image = Image.fromarray(data.astype(np.uint8)).convert("RGBA")

    # Create a BytesIO object
    bytes_io = io.BytesIO()

    # Save the PIL image to the BytesIO object as a PNG
    image.save(bytes_io, format='PNG')

    # return the beginning of the file as a bytestream
    bytes_io.seek(0)
    return bytes_io.read()

image = imread("data/blobs.tif")
image = np.stack([image,image,image],axis=2)
stackview.insight(image)

shape	(254, 256, 3)
dtype	uint8
size	190.5 kB
min	8
max	248

labels = label(image[...,0] > 128)

stackview.insight(labels)

shape	(254, 256)
dtype	int32
size	254.0 kB
min	0
max	63
n labels	63

labels_wo_border = clear_border(labels)

stackview.insight(labels_wo_border)

shape	(254, 256)
dtype	int32
size	254.0 kB
min	0
max	59
n labels	45
Not sequentially labeled!

This is the prompt we submit to the server.

my_prompt = """
Analyse the following image by counting the bright blobs. If you think you're not able to do this, try anyway. Respond with the number only.
"""

prompt_with_image(my_prompt, image=image, model="gemma3:4b")

'63'

Benchmarking vision-LLMs#

We can run this prompt in a loop for a couple of vision models.

num_samples = 25

models = {
    "gemma3:4b":[],    
    "gemma3:12b":[],
    "gemma3:4b":[],
    "minicpm-v":[],
    "gemma3:12b":[],
    "granite3.2-vision":[],
    "llava":[],
    "qwen3:8b":[],
    "qwen3:4b":[],
    "llama3.2-vision:11b":[],
    "qwen2.5vl:7b":[],
}
for model in tqdm(models.keys()):
    samples = []

    while len(samples) < num_samples:
        result = prompt_with_image(my_prompt, image=image, model=model).strip()
        if result.endswith("."):
            result = result[:-1]

        try:
            samples.append(int(result))
        except:
            print("Error processing result:", result)
            samples.append(np.nan)
    
    models[model] = samples

sampled_models = pd.DataFrame(models)

 33%|████████████████████████████                                                        | 3/9 [03:41<07:26, 74.37s/it]

Error processing result: The question seems to be asking to count distinct shapes within an image without making assumptions about their nature or content. However, since the provided image is either extremely small or lacking enough detail due to severe blurring, a precise count cannot be made. Therefore, based on this image alone, it is not possible to provide a specific number of bright blobs that can be counted accurately
Error processing result: The question seems to be based on a misunderstanding. Images are visual representations that don't require a count of objects to be analyzed. Without an image to analyze, I cannot provide details about the brightness or any objects that might exist within it
Error processing result: I'm sorry for the inconvenience, but I am unable to provide an analysis as I cannot visually perceive or interact directly with images. However, if you describe the image in detail or provide any relevant information, I would be happy to assist further!
Error processing result: It appears there might be a misunderstanding in your request as I'm currently unable to provide visual content or analyze photographs directly. However, if you have a numerical count or other statistical data that can help me understand the pattern and intensity of these white spots within the image, I'd be happy to assist with analysis based on that!
Error processing result: I'm sorry for any confusion, but I cannot provide an answer to that question as it stands because there is no visual content or image attached to your request. If you have another type of data, such as numerical values or textual information, please provide it so that I can assist you further
Error processing result: Due to the low resolution and poor lighting conditions in the provided image, I cannot accurately count the number of bright spots or determine its composition
Error processing result: I apologize but I can neither view nor analyze images; my capacity extends primarily within textual interactions!
Error processing result: I'm sorry for any confusion, but I cannot provide assistance or analysis on images that contain discriminable content such as facial expressions, race, gender, religion, age, occupation, etc., unless those specific criteria are applied within the context of image processing algorithms and systems. 

If you have an image analysis task that does not fall under any category listed above, please provide it for me to review and respond with a description or analysis based on the information I can discern from the content of the image
Error processing result: Due to the extremely low resolution and large size of the image provided as a textual description, it is impossible to accurately analyze or count the number of blobs present in the image at all
Error processing result: Please note that as a text-based AI model, I'm unable to directly analyze visual content such as images or videos. However, if you can describe the image or provide more details about it, I may be able to offer some guidance based on your description

 44%|█████████████████████████████████████▎                                              | 4/9 [04:50<06:01, 72.25s/it]

Error processing result: I'm afraid I can't perform this action as it requires analyzing an actual image which isn't available at my current capabilities
Error processing result: There are 80 distinguishable bright areas in the image provided. They appear uniform throughout, so it could be that they are part of a pattern or background material in a microscopic, macroscopic, or other type of study. The image is quite small, making it challenging to count accurately
Error processing result: The image is a bit blurry and rotated, but it appears to be an X-ray or some kind of scanned media where there is a pattern of dots or specks arranged in some sort of linear fashion against a dark background. This could possibly represent a digital signal or some form of sensor data. If you look carefully at the dot pattern, you may recognize a familiar shape embedded within the dots
Error processing result: The image appears to be a pattern of dots or an object with numerous openings or holes that can have various purposes depending on their context within a product or process. They resemble a honeycomb structure for pollination within flowers, where each opening allows for entry and exit of nectar and pollen by bees and other insects. If this is indeed a honeycomb pattern, the number of openings or holes in the visible area would most likely range from 6 to 12, considering their uniform size and pattern. However, without knowing the exact context or scale of the image, I can only provide an estimate based on what is visibly shown
Error processing result: The image you provided is quite small and it is not clear what exactly you are asking me to count or analyze. Without a higher resolution image or specific details about what I am counting or analyzing, my ability to provide an accurate answer is limited. If you could provide more context or a larger version of the image, I would be happy to help you
Error processing result: While I can provide an analysis of data in tables or other numerical formats, as an AI visual assistant, I also have the capability to interpret images visually. However, counting bright blobs in a microscopic image, as you've requested, requires me to convert a microscopic image into a countable format first.

Given the limitations within my current capabilities, I am unable to directly provide a numerical count of bright blobs in a microscopic image
Error processing result: The image appears to show a close-up of a surface covered in small dots or points, making it look like a collection of bubbles or some kind of texture. It seems to be a digital or microscopic image due to the evenness and uniformity of the dots. If this is meant to represent a real-world scene, then I can't directly count the dots without assuming certain shapes or sizes for each dot, which may not align with their actual nature in the image presented
Error processing result: The image provided does not show a pattern of bright blobs, and it is too small to count the dots accurately. To provide a reasonable estimate, I would need a clearer and larger image, or you could describe the size of the dots and their spacing for me to give an estimation based on your description of the pattern
Error processing result: The image shows a pattern of black particles against a dark background which could be interpreted as stars. Without context or a clearer view, it would be difficult for me to provide an accurate count of the bright blobs. My apologies for not being able to assist with this task; the pattern is too abstract and lacks contrast for me to offer a precise count
Error processing result: The resolution of the image is too low for me to accurately count the bright blobs. If you can provide a clearer image or more information about what I'm looking at, I might be able to help you with your counting task!
Error processing result: The image appears to be a zoomed-in microscopic photograph showing numerous circular features that resemble cells or particles against a dark background. Counting these features is challenging due to their small size and the lack of clear defining characteristics in the photo that would distinguish individual elements. Additionally, the resolution of this image does not allow for precise counts. If you require further analysis or assistance with identifying shapes within an image, please let me know!
Error processing result: The image you've provided appears to be quite small and of a low resolution, which makes it challenging to count anything accurately. However, it seems like there could be several dots or particles scattered on what might be a black background. Due to the quality of the image and the small size of the elements in question, I think it would be accurate to say, "there are too many dots to count."
Error processing result: The image is too small and blurry for me to accurately count any objects in it. However, I can infer that there is no distinct object to count, such as bright blobs or anything of significance, due to the low resolution and lack of clear discernibility from the image provided
Error processing result: There are approximately 20 specks that are slightly brighter than the rest of the image. It appears to be a close-up microscopic view, possibly of cells or particles
Error processing result: There are eight brightly colored blobs in the image. They are scattered across the dark background
Error processing result: I'm unable to count objects in an image directly. However, if you describe where each object is or provide more information about the context of the image, I could help deduce the approximate number of bright blobs based on that information
Error processing result: I apologize, but I am unable to visually analyze images and count objects within them like a human would. However, if there's a specific pattern or object you'd like me to count, I can give that information to you
Error processing result: There are no bright blobs in this image. The image appears to be a very grainy or blurry photograph of some material, but there is no content that fits your description of bright blobs. If you meant to ask about something else in the image or have any questions or doubts regarding what you are seeing, I'd be happy to help out and provide information or insights if necessary!
Error processing result: The image provided is a microscopic view of something that resembles textures or particles in a dark background. However, due to the lack of resolution and context provided by the photo, it's challenging to accurately count the bright blobs or particles present in this image. Furthermore, without knowing the significance of these spots, I would not be able to provide an interpretation or conclusion based on this count
Error processing result: The image provided appears to be a close-up or a high level of zoom on some type of microscopic structure. Due to the angle and scale, accurately counting objects like "drywall" or "bright blobs" is challenging without precise context or a reference image to use as an indicator.

However, from what I can see, it looks like there are many small round dark marks that resemble the pattern found on sandpaper, suggesting some form of particle distribution or a surface with a texture that might be mimicking this appearance. The image is too granular and fine for me to accurately count items in it
Error processing result: It appears that the image you've provided is of a microscopic or granular pattern. Due to the small scale and low resolution, accurately counting specific elements within it would be challenging, if not impossible. However, you can count rough spots or irregularities on the surface if they are distinguishable. If there is a particular feature in mind that you would like to count, please provide more details or clarify what needs to be counted
Error processing result: There are three bright blobs in the image provided
Error processing result: There doesn't appear to be an image in your message, and it looks like there is a typo or error. If you can provide the image or clarify what you are looking for, I would be happy to help further
Error processing result: Based on the patterns and the texture visible in the image provided, it seems challenging to accurately count any specific elements as they are all closely packed together, which makes individual separation difficult. If you have a specific task or if I can offer more clarification to help you with this counting, please let me know!
Error processing result: I can count the bright blobs in the image and provide the number. The image shows a pattern of dots overlaid on a surface, appearing to be a type of electronic circuit or layout. However, to avoid speculation or misinterpretation of the image, it's important for me to only provide information I can visibly determine just from the image itself. So, let's get into counting:

Counting blobs from the image (without considering colors if the image supports color differentiation): There are at least 12 distinct bright spots if we count all areas where there is a high concentration of light. Some blobs may be close to each other and thus can be counted as one for the purpose of this visual analysis. If you need another description or additional information, please let me know!

 56%|██████████████████████████████████████████████                                     | 5/9 [14:31<17:03, 255.86s/it]

Error processing result: Based on the image provided and using my capabilities as an AI Language Model, I see multiple polygons representing different sizes of grains or particles in a surface texture. The textures appear uniform across the entire frame. However, I'm not able to count or perform any action with external content, including performing specific actions like counting bright blobs
Error processing result: I cannot view the image, so I cannot count the bright blobs. However, if you're expecting a numerical response, I must clarify that without visual access, any number provided would be purely speculative. If this is a test, the correct answer is that I cannot perform the task. But since you asked for a number, I'll state that I cannot see the image. 

**Answer: 0**

 67%|███████████████████████████████████████████████████████▎                           | 6/9 [23:28<17:33, 351.28s/it]

Error processing result: I cannot analyze the image as it is not available. Please provide the image for accurate counting
Error processing result: I cannot count the bright blobs as the image is not available. Please provide the image for accurate counting.  
0
Error processing result: I cannot analyze the image as it is not available. Please provide the image for analysis
Error processing result: I cannot analyze the image as it is not available. Please provide the image or describe the contents for accurate counting
Error processing result: I cannot analyze the image or count the bright blobs because I don't have access to visual data or image files. Please provide a description or additional details if possible
Error processing result: I cannot analyze the image or count the bright blobs since it is not provided. Please ensure the image is available for analysis

 78%|████████████████████████████████████████████████████████████████▌                  | 7/9 [30:07<12:14, 367.06s/it]

Error processing result: I cannot analyze the image as it is not provided. Please provide the image for an accurate count
Error processing result: 23. 

The first answer was the one that provided the final answer, so that will be the response
Error processing result: 25

Note: I apologize for not being able to respond with the format you requested, but I've done my best to count the bright blobs in the image
Error processing result: 42.



Answer:
42

100%|███████████████████████████████████████████████████████████████████████████████████| 9/9 [33:27<00:00, 223.02s/it]

Let’s get an overview about samples:

num_labels = len(np.unique(labels)) - 1
num_labels_wo_border = len(np.unique(labels_wo_border)) - 1

num_labels, num_labels_wo_border

(63, 45)

# Extract the two columns for comparison
columns_to_plot = sampled_models[models.keys()]

# Melt the dataframe to prepare for plotting
df_melted = columns_to_plot.melt(var_name='Model', value_name='Blob count')

# Draw the violin plot
plt.figure(figsize=(8, 4))
sns.violinplot(x='Model', y='Blob count', data=df_melted)

# Calculate and annotate the number of non-NaN values for each column
non_nan_counts = sampled_models.notna().sum()
for i, count in enumerate(non_nan_counts):
    plt.text(i, 90, f'n={count}', ha='center', va='bottom')

# Add a horizontal line at the eterministic labels counts w/wo border
plt.axhline(y=num_labels, color='green', linestyle='--')
plt.axhline(y=num_labels_wo_border, color='magenta', linestyle='--')

plt.title('Vision models counting blobs')
plt.ylim(0, 100)  # Set the y-axis limits
plt.xticks(rotation=45)  # Rotate the x-axis labels
plt.show()

../_images/89be0e38572e91d70286d436d2cef052bc586e2c2bb5df64942cb64f18dc52fc.png

These are the results in detail:

sampled_models

	gemma3:4b	gemma3:12b	minicpm-v	granite3.2-vision	llava	qwen3:8b	qwen3:4b	llama3.2-vision:11b	qwen2.5vl:7b
0	63	70	1003	0.0	NaN	0.0	0.0	36.0	62
1	68	60	158	NaN	NaN	5.0	0.0	27.0	25
2	68	47	103	13.0	NaN	5.0	0.0	25.0	22
3	83	55	7618	50.0	NaN	0.0	0.0	40.0	143
4	73	75	46	15.0	NaN	3.0	0.0	25.0	78
5	65	83	1402	30.0	NaN	0.0	0.0	45.0	50
6	53	65	186	NaN	NaN	3.0	0.0	NaN	25
7	68	77	1809	NaN	NaN	5.0	0.0	44.0	100
8	65	66	64582	35.0	NaN	7.0	0.0	34.0	2
9	63	68	43	NaN	NaN	7.0	0.0	21.0	25
10	63	74	105	40.0	NaN	3.0	NaN	42.0	45
11	48	64	27	NaN	NaN	0.0	0.0	47.0	20
12	73	65	91	NaN	NaN	5.0	NaN	45.0	55
13	63	55	103	0.0	NaN	0.0	0.0	18.0	50
14	74	59	104	0.0	NaN	0.0	0.0	29.0	18
15	63	63	254	25.0	NaN	5.0	0.0	37.0	75
16	69	67	372	100.0	NaN	5.0	NaN	20.0	30
17	72	67	2160	NaN	NaN	0.0	0.0	31.0	100
18	61	63	1053	8.0	NaN	8.0	0.0	NaN	40
19	62	75	1602	NaN	NaN	0.0	0.0	27.0	55
20	65	73	2037	NaN	NaN	0.0	NaN	NaN	30
21	52	65	17	NaN	NaN	NaN	0.0	21.0	64
22	76	61	36	0.0	NaN	3.0	NaN	23.0	100
23	52	80	40	84.0	NaN	5.0	NaN	29.0	16
24	65	87	104	NaN	NaN	0.0	NaN	23.0	105

sampled_models.describe()

	gemma3:4b	gemma3:12b	minicpm-v	granite3.2-vision	llava	qwen3:8b	qwen3:4b	llama3.2-vision:11b	qwen2.5vl:7b
count	25.000000	25.000000	25.000000	14.000000	0.0	24.000000	18.0	22.000000	25.00000
mean	65.080000	67.360000	3402.200000	28.571429	NaN	2.875000	0.0	31.318182	53.40000
std	8.067011	9.209958	12842.898975	31.544143	NaN	2.755429	0.0	9.301236	35.14968
min	48.000000	47.000000	17.000000	0.000000	NaN	0.000000	0.0	18.000000	2.00000
25%	63.000000	63.000000	91.000000	2.000000	NaN	0.000000	0.0	23.500000	25.00000
50%	65.000000	66.000000	158.000000	20.000000	NaN	3.000000	0.0	29.000000	50.00000
75%	69.000000	74.000000	1402.000000	38.750000	NaN	5.000000	0.0	39.250000	75.00000
max	83.000000	87.000000	64582.000000	100.000000	NaN	8.000000	0.0	47.000000	143.00000

sampled_models.to_csv("blob_count_by_vlms.csv")

Open Weight VLMs for Counting objects

Contents

Open Weight VLMs for Counting objects#

Benchmarking vision-LLMs#