Masking NumPy arrays#

Masking is a term used for selecting entries in arrays via conditional / logical queries on the data. To do this, we need to use numpy arrays. NumPy is a package for scientific computing and efficient numerical operations in Python. NumPy arrays (multidimensional, homogeneous arrays of fixed-size items) are the basic data structure. Per definition, they contain numerical data types only.

First, we define a NumPy array:

import numpy
measurements = numpy.asarray([1, 17, 25, 3, 5, 25, 12])
measurements
array([ 1, 17, 25,  3,  5, 25, 12])
type(measurements)
numpy.ndarray
measurements.dtype
dtype('int64')

Next, we create the mask, e.g., the condition for all measurements that are above a given threshold:

mask = measurements > 10
mask
array([False,  True,  True, False, False,  True,  True])

We can now apply that mask via []operator to our data to retrieve a new array that only contains masked values.

measurements[mask]
array([17, 25, 25, 12])

All this can also be done in one line. In addition, more complex conditions can be defined with logical operators like & (and) or | (or), where combined conditions must be separated by parentheses ().

measurements[(measurements > 10) & (measurements < 20)]
array([17, 12])

Exercises#

Use masking to select all measurements equal 25

Use masking to select all measurements equal and below 5 or above 20