Selecting data from DataFrames#
Like with Python sequences, we may want to get specific parts of the data in a DataFrame. And like with Python sequences, we can use indexing, slicing, and masking on DataFrames, too.
See also:
We start by reading our statistics CSV file into a DataFrame again
import pandas as pd
df = pd.read_csv('../data/blobs_statistics.csv', index_col=0)
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 61 entries, 0 to 60
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 area 61 non-null int64
1 mean_intensity 61 non-null float64
2 minor_axis_length 61 non-null float64
3 major_axis_length 61 non-null float64
4 eccentricity 61 non-null float64
5 extent 61 non-null float64
6 feret_diameter_max 61 non-null float64
7 equivalent_diameter_area 61 non-null float64
8 bbox-0 61 non-null int64
9 bbox-1 61 non-null int64
10 bbox-2 61 non-null int64
11 bbox-3 61 non-null int64
dtypes: float64(7), int64(5)
memory usage: 6.2 KB
Pandas provides methods to display just the first n
rows or last n
rows of a DataFrame - head()
and tail()
df.head(10)
area | mean_intensity | minor_axis_length | major_axis_length | eccentricity | extent | feret_diameter_max | equivalent_diameter_area | bbox-0 | bbox-1 | bbox-2 | bbox-3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 422 | 192.379147 | 16.488550 | 34.566789 | 0.878900 | 0.586111 | 35.227830 | 23.179885 | 0 | 11 | 30 | 35 |
1 | 182 | 180.131868 | 11.736074 | 20.802697 | 0.825665 | 0.787879 | 21.377558 | 15.222667 | 0 | 53 | 11 | 74 |
2 | 661 | 205.216339 | 28.409502 | 30.208433 | 0.339934 | 0.874339 | 32.756679 | 29.010538 | 0 | 95 | 28 | 122 |
3 | 437 | 216.585812 | 23.143996 | 24.606130 | 0.339576 | 0.826087 | 26.925824 | 23.588253 | 0 | 144 | 23 | 167 |
4 | 476 | 212.302521 | 19.852882 | 31.075106 | 0.769317 | 0.863884 | 31.384710 | 24.618327 | 0 | 237 | 29 | 256 |
5 | 277 | 206.469314 | 17.523565 | 20.151370 | 0.493763 | 0.769444 | 20.880613 | 18.779972 | 6 | 189 | 26 | 207 |
6 | 259 | 178.007722 | 15.309080 | 21.761302 | 0.710695 | 0.735795 | 22.803509 | 18.159544 | 17 | 211 | 39 | 227 |
7 | 219 | 191.598174 | 15.745458 | 17.729017 | 0.459616 | 0.760417 | 18.681542 | 16.698487 | 18 | 37 | 36 | 53 |
8 | 67 | 167.522388 | 8.731268 | 9.761752 | 0.447195 | 0.827160 | 10.295630 | 9.236182 | 18 | 133 | 27 | 142 |
9 | 19 | 155.368421 | 1.967201 | 11.358057 | 0.984887 | 0.863636 | 11.000000 | 4.918491 | 21 | 0 | 32 | 2 |
We can also get the column names via columns
. This will provide a so-called Index
object with the column names and some metadata
df.columns
Index(['area', 'mean_intensity', 'minor_axis_length', 'major_axis_length',
'eccentricity', 'extent', 'feret_diameter_max',
'equivalent_diameter_area', 'bbox-0', 'bbox-1', 'bbox-2', 'bbox-3'],
dtype='object')
Selecting rows and columns#
Let’s get the DataFrames first row
df[0]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/Documents/Miniconda/envs/devbio-napari-env/lib/python3.9/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
3804 try:
-> 3805 return self._engine.get_loc(casted_key)
3806 except KeyError as err:
File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()
File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[5], line 1
----> 1 df[0]
File ~/Documents/Miniconda/envs/devbio-napari-env/lib/python3.9/site-packages/pandas/core/frame.py:4090, in DataFrame.__getitem__(self, key)
4088 if self.columns.nlevels > 1:
4089 return self._getitem_multilevel(key)
-> 4090 indexer = self.columns.get_loc(key)
4091 if is_integer(indexer):
4092 indexer = [indexer]
File ~/Documents/Miniconda/envs/devbio-napari-env/lib/python3.9/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
3807 if isinstance(casted_key, slice) or (
3808 isinstance(casted_key, abc.Iterable)
3809 and any(isinstance(x, slice) for x in casted_key)
3810 ):
3811 raise InvalidIndexError(key)
-> 3812 raise KeyError(key) from err
3813 except TypeError:
3814 # If we have a listlike key, _check_indexing_error will raise
3815 # InvalidIndexError. Otherwise we fall through and re-raise
3816 # the TypeError.
3817 self._check_indexing_error(key)
KeyError: 0
Ooops…this raised a KeyError
. For DataFrames, the []
operator expects labels only, and works on the columns, not the rows. We can use it to get a subset of the DataFrame with the specified column labels:
df['area']
0 422
1 182
2 661
3 437
4 476
...
56 211
57 78
58 86
59 51
60 46
Name: area, Length: 61, dtype: int64
Notice that the resulting data structure has the column data and an index, too. That’s because Pandas DataFrames use Pandas Series for the columns, which are one-dimensional arrays with an index for the rows and a label for the column data. It can be seen as a single-column table.
We can get more columns by passing their names as a list. Furthermore, we can store this “sub-dataframe” in a new variable.
df_subset = df[['area', 'mean_intensity']]
df_subset
area | mean_intensity | |
---|---|---|
0 | 422 | 192.379147 |
1 | 182 | 180.131868 |
2 | 661 | 205.216339 |
3 | 437 | 216.585812 |
4 | 476 | 212.302521 |
... | ... | ... |
56 | 211 | 185.061611 |
57 | 78 | 185.230769 |
58 | 86 | 183.720930 |
59 | 51 | 190.431373 |
60 | 46 | 175.304348 |
61 rows × 2 columns
Label-location based selection#
If we want to select specific rows, we need to use the method loc[]
. Note, that loc[]
also works with labels only. Let’s get the row with label 0 (its index label).
df.loc[0]
area 422.000000
mean_intensity 192.379147
minor_axis_length 16.488550
major_axis_length 34.566789
eccentricity 0.878900
extent 0.586111
feret_diameter_max 35.227830
equivalent_diameter_area 23.179885
bbox-0 0.000000
bbox-1 11.000000
bbox-2 30.000000
bbox-3 35.000000
Name: 0, dtype: float64
We can also select more than one row by providing the row labels as a list. In addition, we can select specific columns, too:
df.loc[[0,2], ['area', 'mean_intensity']]
area | mean_intensity | |
---|---|---|
0 | 422 | 192.379147 |
2 | 661 | 205.216339 |
We can also use slicing via [from:to]
to get data starting from
a specific label up to
a specific label.
df.loc[0:2, 'area':'extent']
area | mean_intensity | minor_axis_length | major_axis_length | eccentricity | extent | |
---|---|---|---|---|---|---|
0 | 422 | 192.379147 | 16.488550 | 34.566789 | 0.878900 | 0.586111 |
1 | 182 | 180.131868 | 11.736074 | 20.802697 | 0.825665 | 0.787879 |
2 | 661 | 205.216339 | 28.409502 | 30.208433 | 0.339934 | 0.874339 |
Attention: Slicing on label-location requires that the order of the labels in the row and column index is known and taken into account:
df.loc[0:2, 'extent':'area']
0 |
---|
1 |
2 |
Integer-location based selection#
If we want to select data based on the indices’ integer positions, we need to use iloc[]
. Let’s get the value from the first row and the first column:
df.iloc[0, 0]
422
We can use slicing to get specific parts of the data. Let’s get the first 10 rows and the first 5 columns:
df.iloc[0:10, 0:5]
area | mean_intensity | minor_axis_length | major_axis_length | eccentricity | |
---|---|---|---|---|---|
0 | 422 | 192.379147 | 16.488550 | 34.566789 | 0.878900 |
1 | 182 | 180.131868 | 11.736074 | 20.802697 | 0.825665 |
2 | 661 | 205.216339 | 28.409502 | 30.208433 | 0.339934 |
3 | 437 | 216.585812 | 23.143996 | 24.606130 | 0.339576 |
4 | 476 | 212.302521 | 19.852882 | 31.075106 | 0.769317 |
5 | 277 | 206.469314 | 17.523565 | 20.151370 | 0.493763 |
6 | 259 | 178.007722 | 15.309080 | 21.761302 | 0.710695 |
7 | 219 | 191.598174 | 15.745458 | 17.729017 | 0.459616 |
8 | 67 | 167.522388 | 8.731268 | 9.761752 | 0.447195 |
9 | 19 | 155.368421 | 1.967201 | 11.358057 | 0.984887 |
Let’s get everything starting from the 10th row and starting from the 5th column:
df.iloc[10:, 5:]
extent | feret_diameter_max | equivalent_diameter_area | bbox-0 | bbox-1 | bbox-2 | bbox-3 | |
---|---|---|---|---|---|---|---|
10 | 0.747692 | 26.925824 | 24.875579 | 21 | 162 | 47 | 187 |
11 | 0.725806 | 33.837849 | 28.322092 | 26 | 59 | 57 | 87 |
12 | 0.767361 | 18.973666 | 16.774562 | 39 | 225 | 57 | 241 |
13 | 0.787879 | 11.704700 | 9.965575 | 40 | 4 | 51 | 13 |
14 | 0.784965 | 26.925824 | 23.909926 | 42 | 128 | 68 | 150 |
15 | 0.767857 | 28.861739 | 25.631847 | 44 | 17 | 72 | 41 |
16 | 0.599078 | 34.014703 | 22.283703 | 44 | 184 | 75 | 205 |
17 | 0.775926 | 28.160256 | 23.097346 | 60 | 89 | 87 | 109 |
18 | 0.747899 | 22.472205 | 18.437868 | 60 | 206 | 81 | 223 |
19 | 0.800454 | 24.186773 | 21.200320 | 63 | 235 | 84 | 256 |
20 | 0.774359 | 16.155494 | 13.865755 | 66 | 162 | 81 | 175 |
21 | 0.790514 | 24.698178 | 22.567583 | 73 | 53 | 96 | 75 |
22 | 0.676190 | 31.953091 | 23.289484 | 75 | 117 | 96 | 147 |
23 | 0.809211 | 19.924859 | 17.697936 | 77 | 26 | 96 | 42 |
24 | 0.827303 | 32.756679 | 25.306906 | 82 | 0 | 114 | 19 |
25 | 0.812865 | 21.023796 | 18.813840 | 86 | 214 | 105 | 232 |
26 | 0.770362 | 35.468296 | 29.446156 | 88 | 157 | 114 | 191 |
27 | 0.800000 | 20.880613 | 14.969641 | 95 | 245 | 115 | 256 |
28 | 0.819222 | 24.351591 | 21.349936 | 102 | 92 | 121 | 115 |
29 | 0.755556 | 30.594117 | 26.318099 | 108 | 25 | 138 | 49 |
30 | 0.552778 | 44.721360 | 27.570347 | 110 | 113 | 150 | 140 |
31 | 0.754167 | 17.088007 | 15.180789 | 115 | 56 | 131 | 71 |
32 | 0.619704 | 38.587563 | 28.299606 | 115 | 151 | 150 | 180 |
33 | 0.790451 | 29.832868 | 27.547246 | 115 | 210 | 141 | 239 |
34 | 0.833333 | 3.162278 | 2.523133 | 125 | 252 | 127 | 255 |
35 | 0.821875 | 21.189620 | 18.299235 | 130 | 0 | 150 | 16 |
36 | 0.665926 | 52.201533 | 33.832563 | 137 | 75 | 187 | 102 |
37 | 0.689855 | 34.985711 | 24.618327 | 137 | 233 | 167 | 256 |
38 | 0.766447 | 19.924859 | 17.223960 | 138 | 59 | 157 | 75 |
39 | 0.788462 | 17.464249 | 14.450304 | 149 | 182 | 165 | 195 |
40 | 0.778656 | 26.627054 | 22.397687 | 153 | 125 | 175 | 148 |
41 | 0.778409 | 26.925824 | 22.875783 | 157 | 5 | 181 | 27 |
42 | 0.815972 | 19.313208 | 17.297725 | 158 | 206 | 176 | 222 |
43 | 0.789474 | 25.495098 | 21.850969 | 160 | 35 | 185 | 54 |
44 | 0.807407 | 32.310989 | 28.856519 | 166 | 167 | 196 | 194 |
45 | 0.776860 | 24.596748 | 21.880084 | 174 | 222 | 196 | 244 |
46 | 0.795330 | 29.154759 | 27.151532 | 185 | 116 | 211 | 144 |
47 | 0.800000 | 16.031220 | 9.027033 | 193 | 251 | 209 | 256 |
48 | 0.774038 | 17.088007 | 14.317527 | 195 | 200 | 211 | 213 |
49 | 0.692424 | 34.205263 | 24.121991 | 199 | 17 | 232 | 37 |
50 | 0.718391 | 32.756679 | 28.209479 | 199 | 89 | 229 | 118 |
51 | 0.775362 | 30.594117 | 26.099486 | 203 | 51 | 233 | 74 |
52 | 0.745455 | 25.179357 | 16.155931 | 214 | 0 | 239 | 11 |
53 | 0.771978 | 28.861739 | 26.749965 | 214 | 220 | 240 | 248 |
54 | 0.782407 | 40.792156 | 32.800723 | 217 | 161 | 244 | 201 |
55 | 0.748663 | 22.360680 | 18.881395 | 223 | 130 | 245 | 147 |
56 | 0.781481 | 18.973666 | 16.390654 | 232 | 39 | 250 | 54 |
57 | 0.722222 | 18.027756 | 9.965575 | 248 | 170 | 254 | 188 |
58 | 0.781818 | 22.000000 | 10.464158 | 249 | 117 | 254 | 139 |
59 | 0.728571 | 14.035669 | 8.058239 | 249 | 228 | 254 | 242 |
60 | 0.766667 | 15.033296 | 7.653040 | 250 | 67 | 254 | 82 |
To clarify the difference between label-location and integer-location again, let’s try both approaches again on a sorted dataset:
df_sorted = df.sort_values(by='area', ascending=False)
df_sorted
area | mean_intensity | minor_axis_length | major_axis_length | eccentricity | extent | feret_diameter_max | equivalent_diameter_area | bbox-0 | bbox-1 | bbox-2 | bbox-3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
36 | 899 | 198.291435 | 21.753901 | 54.500296 | 0.916885 | 0.665926 | 52.201533 | 33.832563 | 137 | 75 | 187 | 102 |
54 | 845 | 198.295858 | 26.478859 | 40.977288 | 0.763182 | 0.782407 | 40.792156 | 32.800723 | 217 | 161 | 244 | 201 |
26 | 681 | 198.308370 | 24.756299 | 36.310074 | 0.731536 | 0.770362 | 35.468296 | 29.446156 | 88 | 157 | 114 | 191 |
2 | 661 | 205.216339 | 28.409502 | 30.208433 | 0.339934 | 0.874339 | 32.756679 | 29.010538 | 0 | 95 | 28 | 122 |
44 | 654 | 199.706422 | 27.176616 | 30.914646 | 0.476664 | 0.807407 | 32.310989 | 28.856519 | 166 | 167 | 196 | 194 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
47 | 64 | 190.250000 | 5.354655 | 15.925594 | 0.941780 | 0.800000 | 16.031220 | 9.027033 | 193 | 251 | 209 | 256 |
59 | 51 | 190.431373 | 5.032414 | 13.742079 | 0.930534 | 0.728571 | 14.035669 | 8.058239 | 249 | 228 | 254 | 242 |
60 | 46 | 175.304348 | 3.803982 | 15.948714 | 0.971139 | 0.766667 | 15.033296 | 7.653040 | 250 | 67 | 254 | 82 |
9 | 19 | 155.368421 | 1.967201 | 11.358057 | 0.984887 | 0.863636 | 11.000000 | 4.918491 | 21 | 0 | 32 | 2 |
34 | 5 | 161.600000 | 1.788854 | 3.098387 | 0.816497 | 0.833333 | 3.162278 | 2.523133 | 125 | 252 | 127 | 255 |
61 rows × 12 columns
As we can see above, the index labels are not in order anymore. Let’s apply loc[]
and iloc[]
again, to get the “first three rows”:
# label-location
df_sorted.loc[[0,1,2]]
area | mean_intensity | minor_axis_length | major_axis_length | eccentricity | extent | feret_diameter_max | equivalent_diameter_area | bbox-0 | bbox-1 | bbox-2 | bbox-3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 422 | 192.379147 | 16.488550 | 34.566789 | 0.878900 | 0.586111 | 35.227830 | 23.179885 | 0 | 11 | 30 | 35 |
1 | 182 | 180.131868 | 11.736074 | 20.802697 | 0.825665 | 0.787879 | 21.377558 | 15.222667 | 0 | 53 | 11 | 74 |
2 | 661 | 205.216339 | 28.409502 | 30.208433 | 0.339934 | 0.874339 | 32.756679 | 29.010538 | 0 | 95 | 28 | 122 |
# integer-location
df_sorted.iloc[0:3]
area | mean_intensity | minor_axis_length | major_axis_length | eccentricity | extent | feret_diameter_max | equivalent_diameter_area | bbox-0 | bbox-1 | bbox-2 | bbox-3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
36 | 899 | 198.291435 | 21.753901 | 54.500296 | 0.916885 | 0.665926 | 52.201533 | 33.832563 | 137 | 75 | 187 | 102 |
54 | 845 | 198.295858 | 26.478859 | 40.977288 | 0.763182 | 0.782407 | 40.792156 | 32.800723 | 217 | 161 | 244 | 201 |
26 | 681 | 198.308370 | 24.756299 | 36.310074 | 0.731536 | 0.770362 | 35.468296 | 29.446156 | 88 | 157 | 114 | 191 |
Boolean indexing#
With boolean indexing (or masking), we can select specific parts of the data based on conditional / logical queries. Suppose we want to get entries for which “area” > 500, we can use this condition in the []
operator.
mask = df["area"] > 500
mask
0 False
1 False
2 True
3 False
4 False
...
56 False
57 False
58 False
59 False
60 False
Name: area, Length: 61, dtype: bool
# apply the mask to the DataFrame: df[mask]
# Or as one-liner:
df[df["area"] > 500]
area | mean_intensity | minor_axis_length | major_axis_length | eccentricity | extent | feret_diameter_max | equivalent_diameter_area | bbox-0 | bbox-1 | bbox-2 | bbox-3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 661 | 205.216339 | 28.409502 | 30.208433 | 0.339934 | 0.874339 | 32.756679 | 29.010538 | 0 | 95 | 28 | 122 |
11 | 630 | 173.600000 | 24.460212 | 33.020881 | 0.671781 | 0.725806 | 33.837849 | 28.322092 | 26 | 59 | 57 | 87 |
15 | 516 | 194.403101 | 23.585468 | 27.907939 | 0.534582 | 0.767857 | 28.861739 | 25.631847 | 44 | 17 | 72 | 41 |
24 | 503 | 198.648111 | 19.812612 | 33.323794 | 0.804060 | 0.827303 | 32.756679 | 25.306906 | 82 | 0 | 114 | 19 |
26 | 681 | 198.308370 | 24.756299 | 36.310074 | 0.731536 | 0.770362 | 35.468296 | 29.446156 | 88 | 157 | 114 | 191 |
29 | 544 | 198.455882 | 23.888675 | 29.100741 | 0.571078 | 0.755556 | 30.594117 | 26.318099 | 108 | 25 | 138 | 49 |
30 | 597 | 190.954774 | 17.078429 | 47.451008 | 0.932984 | 0.552778 | 44.721360 | 27.570347 | 110 | 113 | 150 | 140 |
32 | 629 | 193.666137 | 20.719000 | 40.268495 | 0.857478 | 0.619704 | 38.587563 | 28.299606 | 115 | 151 | 150 | 180 |
33 | 596 | 210.067114 | 26.320789 | 28.931798 | 0.415150 | 0.790451 | 29.832868 | 27.547246 | 115 | 210 | 141 | 239 |
36 | 899 | 198.291435 | 21.753901 | 54.500296 | 0.916885 | 0.665926 | 52.201533 | 33.832563 | 137 | 75 | 187 | 102 |
44 | 654 | 199.706422 | 27.176616 | 30.914646 | 0.476664 | 0.807407 | 32.310989 | 28.856519 | 166 | 167 | 196 | 194 |
46 | 579 | 200.649396 | 26.484866 | 27.884001 | 0.312788 | 0.795330 | 29.154759 | 27.151532 | 185 | 116 | 211 | 144 |
50 | 625 | 217.894400 | 24.624993 | 32.379307 | 0.649319 | 0.718391 | 32.756679 | 28.209479 | 199 | 89 | 229 | 118 |
51 | 535 | 189.936449 | 22.795105 | 29.898229 | 0.647079 | 0.775362 | 30.594117 | 26.099486 | 203 | 51 | 233 | 74 |
53 | 562 | 215.928826 | 26.012538 | 27.607523 | 0.334977 | 0.771978 | 28.861739 | 26.749965 | 214 | 220 | 240 | 248 |
54 | 845 | 198.295858 | 26.478859 | 40.977288 | 0.763182 | 0.782407 | 40.792156 | 32.800723 | 217 | 161 | 244 | 201 |
We can also use this syntax with loc[]
to return specific columns only, e.g., “area” and “mean_intensity”.
df.loc[(df["area"] > 500), ['area', 'mean_intensity']]
area | mean_intensity | |
---|---|---|
2 | 661 | 205.216339 |
11 | 630 | 173.600000 |
15 | 516 | 194.403101 |
24 | 503 | 198.648111 |
26 | 681 | 198.308370 |
29 | 544 | 198.455882 |
30 | 597 | 190.954774 |
32 | 629 | 193.666137 |
33 | 596 | 210.067114 |
36 | 899 | 198.291435 |
44 | 654 | 199.706422 |
46 | 579 | 200.649396 |
50 | 625 | 217.894400 |
51 | 535 | 189.936449 |
53 | 562 | 215.928826 |
54 | 845 | 198.295858 |
Get scalar values#
If we only want to access a single scalar value, the fastest way is to use the at
and iat
methods.
df.at[2, "area"]
661
df.iat[0, 0]
422
Math with DataFrames#
Now that we know how to select specific parts of a DataFrame, we want to do some math with it. Mathematical operations can be executed directly on a DataFrame or selected parts of it. We can also easily add new columns to the DataFrame to store computed results.
Let’s assume we want to compute the ratio of the two axes by dividing the major axis by the minor axis and store this result in a new colum named axis_ratio. We can just define that new column and use the basic mathematical operator on the existing columns to fill it:
df['axis_ratio'] = df['major_axis_length'] / df['minor_axis_length']
df['axis_ratio']
0 2.096412
1 1.772543
2 1.063321
3 1.063176
4 1.565269
...
56 1.273114
57 2.916048
58 3.917806
59 2.730713
60 4.192637
Name: axis_ratio, Length: 61, dtype: float64
A DataFrame and the underlying Series objects holding the column data also provide lots of built-in methods for mathematical operations and basic statistics via the .
operator.
See also:
# Compute the mean for all columns in the DataFrame
df.mean()
area 358.426230
mean_intensity 191.966564
minor_axis_length 17.127032
major_axis_length 24.796851
eccentricity 0.657902
extent 0.764751
feret_diameter_max 25.323368
equivalent_diameter_area 20.201434
bbox-0 112.786885
bbox-1 122.245902
bbox-2 134.901639
bbox-3 142.131148
axis_ratio 1.650854
dtype: float64
# Get the max value from a column
df['area'].max()
899
Exercise#
From the loaded CSV file, create a table that only contains these columns:
minor_axis_length
major_axis_length
mean_intensity
Also, select all rows where value mean_intensity
is smaller than the mean of the column mean_intensity
.
df_loaded = pd.read_csv('../data/blobs_statistics.csv', index_col=0)