# Tiled image file formats: zarr

When working with big image data, special file formats such as the [zarr](https://zarr.readthedocs.io/en/stable/) format are commonly used. Zarr stores image data in chunks. Instead of loading a huge image data set from disk and then tiling it, it is possible to load individual zarr tiles, process them and save the result back to disc. In that way one can process big images without ever loading the big image into memory. 

Using these formats brings additional challenges, for example re-saving the big image into small zarr-based tiles must happen on a computer that is capable of opening the big image to begin with. This notebook shows how to do this in a slightly unrealistic scenario: We're loading the dataset first to resave it as tiles and by the end, we load these tiles from disk and visualize them. In a realistic scenario, these two steps would not be possible. Depending on the scenario, those two steps must be improvised.

In [1]:
import zarr
import dask.array as da
import numpy as np
from skimage.io import imread
from numcodecs import Blosc
import stackview

For demonstration purposes, we use a dataset that is provided by Theresa Suckert, OncoRay, University Hospital Carl Gustav Carus, TU Dresden. The dataset is licensed [License: CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/). We are using a cropped version here that was resaved a 8-bit image to be able to provide it with the notebook. You find the full size 16-bit image in CZI file format [online](https://zenodo.org/record/4276076#.YX1F-55BxaQ).

In [2]:
image = imread('data/P1_H_C3H_M004_17-cropped.tif')[1]

# for testing purposes, we crop the image even more.
# comment out the following line to run on the whole 5000x2000 pixels
#image = image[1000:1500, 1000:1500]

image.shape

(2000, 5000)

In [3]:
stackview.insight(image)

0,1
,"shape(2000, 5000) dtypeuint8 size9.5 MB min0max255"

0,1
shape,"(2000, 5000)"
dtype,uint8
size,9.5 MB
min,0
max,255


## Saving as zarr
We will now resaved our big image to the [zarr](https://zarr.readthedocs.io/en/stable/) file format.

In [4]:
z = zarr.open("data/P1_H_C3H_M004_17-cropped.zarr", mode="w", chunks=(100, 100), shape=image.shape, dtype=image.dtype)
z[:] = image

You will then see that a folder is created with the given name. In that folder many files will be located. Each of these files correspond to an image tile.

## Loading zarr
Just for demonstration purposes, we will load the zarr backed tiled image and visualize it. When working with big data, this step might not be possible.

In [5]:
zarr_result = zarr.open("data/P1_H_C3H_M004_17-cropped.zarr", mode="r")

zarr_result.info

Type               : Array
Zarr format        : 3
Data type          : UInt8()
Fill value         : 0
Shape              : (2000, 5000)
Chunk shape        : (100, 100)
Order              : C
Read-only          : True
Store type         : LocalStore
Filters            : ()
Serializer         : BytesCodec(endian=None)
Compressors        : (ZstdCodec(level=0, checksum=False),)
No. bytes          : 10000000 (9.5M)

In [6]:
da.from_zarr(zarr_result)

Unnamed: 0,Array,Chunk
Bytes,9.54 MiB,9.77 kiB
Shape,"(2000, 5000)","(100, 100)"
Dask graph,1000 chunks in 2 graph layers,1000 chunks in 2 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray
"Array Chunk Bytes 9.54 MiB 9.77 kiB Shape (2000, 5000) (100, 100) Dask graph 1000 chunks in 2 graph layers Data type uint8 numpy.ndarray",5000  2000,

Unnamed: 0,Array,Chunk
Bytes,9.54 MiB,9.77 kiB
Shape,"(2000, 5000)","(100, 100)"
Dask graph,1000 chunks in 2 graph layers,1000 chunks in 2 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray


In [7]:
# note: this might not work with big data
stackview.insight(zarr_result)

0,1
,"shape(2000, 5000) dtypeuint8 size9.5 MB min0max255"

0,1
shape,"(2000, 5000)"
dtype,uint8
size,9.5 MB
min,0
max,255
