Tiled image processing#
In this notebook we will process a big dataset that has been saved in zarr format to count cells in individual tiles using dask and zarr. The underlying principles will be explained in the next sections.
import zarr
import dask.array as da
import numpy as np
import stackview
For demonstration purposes, we use a dataset that is provided by Theresa Suckert, OncoRay, University Hospital Carl Gustav Carus, TU Dresden. The dataset is licensed License: CC-BY 4.0. We are using a cropped version here that was resaved a 8-bit image to be able to provide it with the notebook. You find the full size 16-bit image in CZI file format online. The biological background is explained in Suckert et al. 2020, where we also applied a similar workflow.
When working with big data, you will likely have an image stored in the right format to begin with. For demonstration purposes, we save here a test image into the zarr format, which is commonly used for handling big image data.
Loading the zarr-backed image data#
For processing images tile-by-tile, the data must be stored in a compatible file format in advance, for example as zarr file.
zarr_image = zarr.open("data/P1_H_C3H_M004_17-cropped.zarr", mode="r")
zarr_image.info
Type : Array
Zarr format : 3
Data type : UInt8()
Fill value : 0
Shape : (2000, 5000)
Chunk shape : (100, 100)
Order : C
Read-only : True
Store type : LocalStore
Filters : ()
Serializer : BytesCodec(endian=None)
Compressors : (ZstdCodec(level=0, checksum=False),)
No. bytes : 10000000 (9.5M)
Such a zarr dataset can be converted to a Dask stack:
da.from_zarr(zarr_image)
|
We can also apply image processing to this tiled dataset directly.
Counting nuclei#
For counting the nuclei, we setup a simple image processing workflow. It returns an image with a single pixel containing the number of nuclei in the given input image. These single pixels will be assembled to a pixel count map; an image with much less pixels than the original image, but with the advantage that we can look at it - it’s no big data anymore.cle.exclude_labels_with_map_values_within_range
import numpy as np
from skimage import filters, segmentation, measure, morphology
from scipy import ndimage
def count_nuclei(image, threshold=15):
"""
Label objects in a binary image and produce a pixel-count-map image.
"""
#print("processing image of shape", image.shape)
# Count nuclei including those which touch the image border
# Replace voronoi_otsu_labeling with equivalent scikit-image approach
blurred = filters.gaussian(np.asarray(image), sigma=3.5, preserve_range=True)
binary = blurred > threshold
# Generate seeds for watershed
distance = ndimage.distance_transform_edt(binary)
local_maxima = morphology.local_maxima(distance, footprint=morphology.disk(1))
seeds = measure.label(local_maxima)
# Perform watershed segmentation
labels = segmentation.watershed(-distance, seeds, mask=binary)
# Relabel to ensure sequential numbering
nuclei_count = labels.max()
# Count nuclei excluding those which touch the image border
labels_without_borders = segmentation.clear_border(labels.copy())
# Relabel again
labels_without_borders = measure.label(labels_without_borders > 0)
nuclei_count_excluding_borders = labels_without_borders.max()
# Both nuclei-count including and excluding nuclei at image borders
# are no good approximation. We should exclude the nuclei only on
# half of the borders to get a good estimate.
# Alternatively, we just take the average of both counts.
result = np.asarray([[(nuclei_count + nuclei_count_excluding_borders) / 2]])
return result
For testing, we apply the function to a subset of the data.
image_crop = np.asarray(zarr_image[1000:1500, 1000:1500])
stackview.insight(image_crop)
|
|
count_nuclei(image_crop)
array([[172.]])
Next, we setup a delayed result by mapping a function on blocks of the data.
tile_map = da.map_blocks(count_nuclei, da.from_zarr(zarr_image))
tile_map
|
As this result image is orders of magnitude smaller then the original, we can compute the whole result map. This is because the count nuclei function reduces the data massively. Each block is turned into a single number.
result = tile_map.compute()
result.shape
(20, 50)
The sum of intensity of all pixels in this image corresponds to the counted number of nuclei in the original image.
result.sum()
np.float64(5545.0)
Again, as the result map is small, we can just visualize it.
stackview.insight(result)
|
|
Sanity check#
Applying the count nuclei
function to the whole zarr image allows us to check for the error of the computation above.
Note that this would not work with big data.
count_nuclei(np.asarray(zarr_image))
array([[5423.5]])
Exercise#
Use the function da.from_zarr(zarr_image).rechunk((500, 500))
to apply cound_nuclei
to differently chunked data. Depending on the chunk size, the error of the method seems different. why?