Introduction to Seaborn#

The definition of seaborn’s website is so concise that we replicate it here:

“Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures.”

That’s it! The main benefit of using it is that it is a more high-level library, which means we can achieve sophisticated plots with much less lines of code. Most axes style customization are done automatically. It can automatically provide plots with summary statistics.

import pandas as pd
import seaborn as sns

Load data#

Let’s first load a dataframe. The table contains continuous data from two images, identified by the last categorical column file_name.

df = pd.read_csv("data/BBBC007_analysis.csv")
df.head()
area intensity_mean major_axis_length minor_axis_length aspect_ratio file_name
0 139 96.546763 17.504104 10.292770 1.700621 20P1_POS0010_D_1UL
1 360 86.613889 35.746808 14.983124 2.385805 20P1_POS0010_D_1UL
2 43 91.488372 12.967884 4.351573 2.980045 20P1_POS0010_D_1UL
3 140 73.742857 18.940508 10.314404 1.836316 20P1_POS0010_D_1UL
4 144 89.375000 13.639308 13.458532 1.013432 20P1_POS0010_D_1UL

The two images originate from The Broad Bioimage Benchmark Collection and show fluroescent microscopy images of Drosophila Kc167 cells.

20P1_POS0007_D_1UL

20P1_POS0010_D_1UL

image007

image010

Relational plots with seaborn#

We will apply the seaborn default theme, but you can choose others here.

sns.set_theme()

Let’s start with a scatter plot of aspect_ratio vs. intensity_mean.

Note: the scatter plot is the default relational plot in sns.relplot(), so that we don’t need to provide kind=scatter.

sns.relplot(data=df,
            x="aspect_ratio",
            y="intensity_mean");
../_images/1c3879754119cdaca00ccbe03ea8bba8b1c5d68ab579cc5b0b101be9d5c0a67b.png

We can already embbed and visualize another feature by providing an extra argument: we want the size of the dots to be proportional to the variable area.

Note: Seaborn automatically adds a legend.

sns.relplot(data=df,
            x="aspect_ratio",
            y="intensity_mean",
            size="area");
../_images/47d6618e67077bfa5197555c3d1ec51261cfb20385e3db76a34345c0be28b17b.png

We can embbed and visualize one more feature by providing another argument. Now the variable major_axis_length will be represented by a continuous color gradient.

sns.relplot(data=df,
            x="aspect_ratio",
            y="intensity_mean",
            size="area",
            hue="major_axis_length");
../_images/5207b4cb4fbd5c2fa4e2f98984aec9467e7f17322eb52f79b1dbaaf96d561995.png

We can even visualize an additional feature by encoding the marker type—using a dot or a triangle—depending on the file_name variable. This allows us to represent five variables in a single 2D diagram! However, while it is possible to include this many visual distinctions, the result might not look very clear or aesthetically pleasing.

Note: the length of the array markers should be at least as long as the number of categories in file_name.

sns.relplot(data=df,
            x="aspect_ratio",
            y="intensity_mean",
            size="area",
            hue="major_axis_length",          
            style="file_name",      
            markers=["o", "^"]);  
../_images/450e31613b02802056df3dce03543df7034151a68f28dd7c160cc9906e149522.png

Define subplots#

Since our plot is now overcrowded, we should find another way to visualize from which image which data originates. Instead of representing the variable file_name as a marker type, we will pass this argument to the col parameter of relplot.

Seaborn then automatically divides the plot into two subplots, each representing one of the file names, and adds the corresponding title to each plot. This approach simplifies the visualization and makes it easier to interpret.

sns.relplot(data=df,
            x="aspect_ratio",
            y="intensity_mean",
            size="area",
            hue="major_axis_length",
            col="file_name");          
../_images/f72b1dab606c881ef3d2165b1e65794a2c251b906616f9e23dea31c7a2ad0ec4.png

Control the appearance of your plot#

That’s already quite neat, but a few things are missing. Let’s say, we would like to reduce the height and make the plots wider. Also the dots should have more contrast. For this, we will provide further arguments to the relplotfunction.

sns.relplot(data=df,
            x="aspect_ratio",
            y="intensity_mean",
            size="area",
            hue="major_axis_length",
            col="file_name",
            height=3.5,         # Set the height to 3.5 inches
            aspect=1.3,         # Set the width/height ratio to 1.3
            palette="magma");   # Change the default color palette to "magma"
../_images/542f10f57cd00ace5b48607b61bacc4ff1090aa8f2fcf5384a338819fdba2a13.png

Further, we want to change the labels of the x and y axes, and give a title to the whole graph. To accomplish this, we will have Seaborn return a FacetGrid object, which we will refer to as g. This object allows us to control the appearance and layout of the plot.

g = sns.relplot(data=df,                             # Return the plot as an object 'g'
                x="aspect_ratio",
                y="intensity_mean",
                size="area",
                hue="major_axis_length",
                col="file_name",
                height=3.5,
                aspect=1.3,
                palette="magma");


g.set_xlabels("Aspect Ratio")                        # Use 'g' to set the labels for the x axis
g.set_ylabels("Mean Intensity")                      # Use 'g' to set the labels for the y axis

g.fig.suptitle("Aspect Ratio vs. Mean Intensity",    # Add a title
               fontsize=16, fontweight="semibold",   # Set the font for the title
               x=0.45, y=1.06);                      # Give its position in relation to the figure coordinates
../_images/8145fff65cd888da266cae9982285180ec8133ca707cce4346628505e9007275.png

Plotting a line regression model#

With sns.lmplot(), you can draw a scatter plot with a line regression model. Let’s observe for instance the relationship between major_axis_length and minor_axis_length in each image.

Note: Seaborn automatically adds a 0.95 confidence interval. The confidence level can be adjusted with the corresponding parameter.

sns.lmplot(data=df,
           x="major_axis_length",
           y="minor_axis_length",
           col="file_name");
../_images/34d227c50c043ca717eac6fd06faa4a4db159b8e69d086d0f707d3668988697c.png

Exercise#

Plot the same line regression model as above, but on a single plot, with points and regression lines having two different colors according to file_name.

# Your code here