Processing files and folders

Processing files and folders#

Python allows to process files and folders on your operating system. In this notebook, we want to start with reading files and their properties from a folder using the built-in module os.

For more information on file handling also see:

import os

path = "data_banana/"

A simple and straight-forward method to read the contents of a directory is listdir().

It returns a list containing the names (Strings) of the entries in the given directory.

file_list = os.listdir(path)

print(type(file_list[0]))

file_list
<class 'str'>
['banana0016.tif',
 'banana0002.tif',
 'banana0003.tif',
 'banana0017.tif',
 'banana0015.tif',
 'subfolder',
 'banana0014.tif',
 'banana0004.tif',
 'banana0010.tif',
 'banana0011.tif',
 'banana0005.tif',
 'banana0013.tif',
 'banana0007.tif',
 'banana0006.tif',
 'banana0012.tif',
 'image_source.txt',
 'banana0023.tif',
 'banana0022.tif',
 'banana0008.tif',
 'banana0020.tif',
 'banana0021.tif',
 'banana0009.tif',
 'banana0025.tif',
 'banana0019.tif',
 'banana0018.tif',
 'banana0024.tif',
 'banana0026.tif']

Attention: Like most of such methods, listdir yields entries in arbitrary order!

But we can sort them alphabetically:

file_list_sorted = sorted(file_list)

file_list_sorted
['banana0002.tif',
 'banana0003.tif',
 'banana0004.tif',
 'banana0005.tif',
 'banana0006.tif',
 'banana0007.tif',
 'banana0008.tif',
 'banana0009.tif',
 'banana0010.tif',
 'banana0011.tif',
 'banana0012.tif',
 'banana0013.tif',
 'banana0014.tif',
 'banana0015.tif',
 'banana0016.tif',
 'banana0017.tif',
 'banana0018.tif',
 'banana0019.tif',
 'banana0020.tif',
 'banana0021.tif',
 'banana0022.tif',
 'banana0023.tif',
 'banana0024.tif',
 'banana0025.tif',
 'banana0026.tif',
 'image_source.txt',
 'subfolder']

As an alternative, scandir() provides objects which also contain additional information about file type or file attributes.

Attention: scandir yields entries in arbitrary order!

with os.scandir(path) as entries:
    for entry in entries:
        if entry.is_file():
            print(f"File:\t{entry.path} - {entry.stat().st_size} bytes")
        if entry.is_dir():
            print(f"Dir:\t{entry.name}/ - {entry.stat().st_size} bytes")
File:	data_banana/banana0016.tif - 307382 bytes
File:	data_banana/banana0002.tif - 307384 bytes
File:	data_banana/banana0003.tif - 307386 bytes
File:	data_banana/banana0017.tif - 307386 bytes
File:	data_banana/banana0015.tif - 307390 bytes
Dir:	subfolder/ - 96 bytes
File:	data_banana/banana0014.tif - 307386 bytes
File:	data_banana/banana0004.tif - 307378 bytes
File:	data_banana/banana0010.tif - 307382 bytes
File:	data_banana/banana0011.tif - 307386 bytes
File:	data_banana/banana0005.tif - 307382 bytes
File:	data_banana/banana0013.tif - 307382 bytes
File:	data_banana/banana0007.tif - 307382 bytes
File:	data_banana/banana0006.tif - 307386 bytes
File:	data_banana/banana0012.tif - 307386 bytes
File:	data_banana/image_source.txt - 54 bytes
File:	data_banana/banana0023.tif - 307386 bytes
File:	data_banana/banana0022.tif - 307374 bytes
File:	data_banana/banana0008.tif - 307386 bytes
File:	data_banana/banana0020.tif - 307386 bytes
File:	data_banana/banana0021.tif - 307386 bytes
File:	data_banana/banana0009.tif - 307386 bytes
File:	data_banana/banana0025.tif - 307390 bytes
File:	data_banana/banana0019.tif - 307382 bytes
File:	data_banana/banana0018.tif - 307386 bytes
File:	data_banana/banana0024.tif - 307382 bytes
File:	data_banana/banana0026.tif - 307390 bytes

We can also filter our file list so that we are left with image files only. This can be done via a short for statement

image_file_list = [file for file in file_list_sorted if file.endswith(".tif")]

image_file_list
['banana0002.tif',
 'banana0003.tif',
 'banana0004.tif',
 'banana0005.tif',
 'banana0006.tif',
 'banana0007.tif',
 'banana0008.tif',
 'banana0009.tif',
 'banana0010.tif',
 'banana0011.tif',
 'banana0012.tif',
 'banana0013.tif',
 'banana0014.tif',
 'banana0015.tif',
 'banana0016.tif',
 'banana0017.tif',
 'banana0018.tif',
 'banana0019.tif',
 'banana0020.tif',
 'banana0021.tif',
 'banana0022.tif',
 'banana0023.tif',
 'banana0024.tif',
 'banana0025.tif',
 'banana0026.tif']

And we can show our images using additional libaries. In this case we use skimage to read the actual image data from the file, and stackview to display the image.

from skimage.io import imread
import stackview
# Lets display 5 of the images
for file in image_file_list[10:15]:
    image = imread(path + file)
    stackview.imshow(image)
../_images/13bdc385f21f2af4eb11575754059560ca82f41a35de8d9417eac137a55c6ce8.png ../_images/9a48e944056892f5bdcdb8d9d248bbe029f5f2ace649ae08014031203724bbd6.png ../_images/0000593db794068d6dee7c672dd59cb716435738190cd33a848eb739625cfcee.png ../_images/b6e3ffd17838f3d43a59068cc69009e91e2851541034323af661cac520a690d5.png ../_images/4571bb3608481f1e0ca3fc3da5b10fd67810691a259dae5f06a653532899ffd2.png