Cluster plots

Cluster plots#

Using stackview.clusterplot we can visualize contents of pandas DataFrames and corresponding segmented objects in an sime side-by-side. In such a plot you can select objects and visualize the selection. This might be useful for exploring feature extraction parameter spaces.

import pandas as pd
import numpy as np
import stackview
import pandas as pd
from skimage.io import imread, imsave
from skimage.measure import regionprops_table
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler 
from umap import UMAP
import pyclesperanto as cle

To demonstrate this, we need an image, a segmentation and a table of extracted features.

image = imread('../02_bia_bob/data/lund.tif')
stackview.insight(image)

shape	(116, 636, 354)
dtype	uint8
size	24.9 MB
min	0
max	255

labeled_image = imread('../02_bia_bob/data/lund_labels.tif')

# for reproducibility purposes, the segmented image was made using pyclesperanto:
#background_sub_image = cle.top_hat_box(image, radius_x=15, radius_y=15, radius_z=0)
#labeled_image = cle.voronoi_otsu_labeling(background_sub_image, spot_sigma=1, outline_sigma=0.5)
#labeled_image = np.asarray(cle.exclude_labels_outside_size_range(labeled_image, minimum_size=10, maximum_size=10000000))

stackview.insight(labeled_image)

shape	(116, 636, 354)
dtype	uint32
size	99.6 MB
min	0
max	1306
n labels	1306

properties = regionprops_table(labeled_image, intensity_image=image, properties=[
    'mean_intensity', 'std_intensity',
    'centroid',  'area', 'feret_diameter_max', 
    'minor_axis_length', 'major_axis_length'])

df = pd.DataFrame(properties)

# Select numeric columns
numeric_cols = df.select_dtypes(include=[np.number]).columns

# Scale the data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df[numeric_cols])

# Create UMAP embedding
umap = UMAP(n_components=2, random_state=42) 
umap_coords = umap.fit_transform(scaled_data)

# Add UMAP coordinates to dataframe 
df['UMAP1'] = umap_coords[:, 0]
df['UMAP2'] = umap_coords[:, 1]

df.head()

C:\Users\rober\miniforge3\envs\bob-env\Lib\site-packages\umap\umap_.py:1952: UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.
  warn(

	mean_intensity	std_intensity	centroid-0	centroid-1	centroid-2	area	feret_diameter_max	minor_axis_length	major_axis_length	UMAP1	UMAP2
0	43.500000	3.727312	0.250000	35.142857	252.964286	28.0	7.280110	1.926058	7.071772	-2.623879	2.828131
1	41.200000	1.681269	0.266667	48.200000	253.466667	30.0	6.324555	1.963177	5.986220	-2.604730	2.823892
2	83.397590	6.051864	0.602410	199.795181	64.156627	83.0	9.539392	3.029975	9.910506	3.148098	2.772093
3	103.492308	2.826376	1.069231	211.869231	91.392308	130.0	9.643651	4.389510	8.818684	3.238054	2.795944
4	101.157576	4.142831	1.357576	214.418182	84.763636	165.0	9.899495	5.211984	8.928511	3.224564	2.796298

num_objects = df.shape[0]
pre_selection = np.zeros(num_objects)
pre_selection[:int(num_objects/2)] = 1

df["selection"] = pre_selection

Interaction#

Using some more involved code we can also draw the image and the scatter plot side-by-side and make them interact. You can select data points in the plot on the right and the visualization on the left will be updated accordingly.

stackview.clusterplot(image=image,
                     labels=labeled_image,
                     df=df,
                     column_x="UMAP1",
                     column_y="UMAP2",
                     zoom_factor=0.75,
                     markersize=15,
                     alpha=0.6)

Every time the user selects different data points, the selection in our dataframe is update

df["selection"]

     1.0
     1.0
     1.0
     1.0
     1.0
       ... 
  0.0
  0.0
  0.0
  0.0
  0.0
Name: selection, Length: 1306, dtype: float64

Cluster plots

Contents

Cluster plots#

Interaction#