Masking NumPy arrays#
Masking is a term used for selecting entries in arrays via conditional / logical queries on the data. To do this, we need to use numpy arrays. NumPy is a package for scientific computing and efficient numerical operations in Python. NumPy arrays (multidimensional, homogeneous arrays of fixed-size items) are the basic data structure. Per definition, they contain numerical data types only.
First, we define a NumPy array:
import numpy
measurements = numpy.asarray([1, 17, 25, 3, 5, 25, 12])
measurements
array([ 1, 17, 25, 3, 5, 25, 12])
type(measurements)
numpy.ndarray
measurements.dtype
dtype('int64')
Next, we create the mask, e.g., the condition for all measurements that are above a given threshold:
mask = measurements > 10
mask
array([False, True, True, False, False, True, True])
We can now apply that mask via []
operator to our data to retrieve a new array that only contains masked values.
measurements[mask]
array([17, 25, 25, 12])
All this can also be done in one line. In addition, more complex conditions can be defined with logical operators like &
(and) or |
(or), where combined conditions must be separated by parentheses ()
.
measurements[(measurements > 10) & (measurements < 20)]
array([17, 12])
Exercises#
Use masking to select all measurements equal 25
Use masking to select all measurements equal and below 5 or above 20