This lesson introduces data visualization in Python using Matplotlib and Seaborn, showing how to build scatterplots, adjust aesthetics, and customize labels to create clear figures.
Explain the concept of layering in plotting and how to build a plot step by step.
Create a scatterplot using MatPlotLib and customize its aesthetics with Seaborn.
Apply different themes to a plot and adjust axis labels and titles.
Overview of lesson
Plots are one of the best ways to communication and summarize results to others. With Matplotlib and Seaborn, you can create customizable visualizations from your data. Data scientists and researchers use these tools everyday to explore trends and create publication-ready figures for presentations and manuscripts. In this lesson, you will learn the basics of building a scatterplot and adjusting aesthetics to give you the foundation for creating any plot you may want to generate in the future.
Plotting Basics
MatPlotLib is one of the most widely used plotting packages in Python. With it, we can create many different types of plots, including scatterplots, line plots, bar plots, boxplots, and more. The important thing to remember is that you can slowly build upon your plot, adding different layers of information to create a more informative and visually appealing plot. So there is no need to create a perfect plot in one step!
We will start with drawing a simple x-y scatterplot of mean_expression versus age_in_days from new_metadata.
Initialize a Plot with MatPlotLib
First, we will import the MatPlotLib library and load new_metadata that we created in the previous lesson:
import matplotlib.pyplot as pltimport pandas as pd # Load the new metadata data frame that we created in the previous lessonnew_metadata = pd.read_csv("data/new_metadata.csv", index_col=0) new_metadata
Table 1: DataFrame containing updated metadata for each of our 12 samples.
genotype
celltype
replicate
mean_expression
age_in_days
sample1
Wt
typeA
1
10.266102
40
sample2
Wt
typeA
2
10.849759
32
sample3
Wt
typeA
3
9.452517
38
sample4
KO
typeA
1
15.833872
35
sample5
KO
typeA
2
15.590184
41
sample6
KO
typeA
3
15.551529
32
sample7
Wt
typeB
1
15.522219
34
sample8
Wt
typeB
2
13.808281
26
sample9
Wt
typeB
3
14.108399
28
sample10
KO
typeB
1
10.743292
28
sample11
KO
typeB
2
10.778318
30
sample12
KO
typeB
3
9.754733
32
We will first initialize a plot using the figure() function from MatPlotLib. Let us look at some of the arguments we can use with the the help() function:
# Look at the help for the figure functionhelp(plt.figure)
So first, let’s create an empty plot of size 8 inches by 6 inches:
# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))
Figure 1: An empty plot initialized with MatPlotLib.
Adding a Scatterplot Layer
We are once again to going to first initialize the plot with figure() and then add the scatterplot layer with scatter(). Next, we add a scatterplot layer using the scatter() function. We have to specify the x and y values for our scatterplot, which will be age_in_days and mean_expression, respectively.
The plt.figure and plt.scatter calls are connected because MatPlotLib commands build upon each other to. The plt.figure command initializes the plot and sets the size, while the plt.scatter command adds the scatterplot layer to the existing plot. By calling these functions sequentially, we can build up our plot layer by layer.
# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))# Add a scatterplot layer to the plotplt.scatter(x=new_metadata["age_in_days"], y=new_metadata["mean_expression"])
Figure 2: Scatterplot of age in days vs. mean expression.
Now that we have the required fundamentals, let’s add some extras like color to the plot. We can color the points on the plot based on the genotype column with the c argument.
# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))# Add a scatterplot layer to the plot, coloring points by genotypeplt.scatter(data = new_metadata, x="age_in_days", y="mean_expression", c="genotype")
ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not sample1 Wt
sample2 Wt
sample3 Wt
sample4 KO
sample5 KO
sample6 KO
sample7 Wt
sample8 Wt
sample9 Wt
sample10 KO
sample11 KO
sample12 KO
Name: genotype, dtype: str
Figure 3: Initial attempt to color the scatterplot of age in days vs. mean expression by genotype, which results in an error.
We are getting an error from trying to set the color. This is because the c argument in scatter() expects a list of color values, but we are providing it with categorical data from the genotype column.
Changing Aesthetics
To get around the error from plt.scatter, we can instead use the seaborn package’s scatterplot() function and use the hue argument instead of c, which allows us to specify a categorical variable for coloring the points. The documentation for seaborn.scatterplot() is quite extensive and can be found on their official website.
You will notice that there are a default set of colors that will be used so we do not have to specify. The legend has also been plotted for us automatically!
import seaborn as sns# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))# Add a scatterplot layer to the plot, coloring points by genotypesns.scatterplot(data = new_metadata, x="age_in_days", y="mean_expression", hue="genotype")
Figure 4: Scatterplot of age in days vs. mean expression, colored by genotype.
seaborn is a bit more flexible than MatPlotLib and allows us to easily add more aesthetics to our plot. Oftentimes you will find yourself using a blend of both packages together to create the plot you want.
Let’s try to have both celltype and genotype represented on the plot. To do this we can assign the shape argument in scatterplot() the celltype column, so that each celltype is plotted with a different shaped data point.
# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))# Add a scatterplot layer to the plot, coloring points by genotypesns.scatterplot(data=new_metadata, x="age_in_days", y="mean_expression", hue="genotype", style="celltype")
Figure 5: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype.
The data points are quite small. We can adjust the s (size) of the data points within the scatterplot() function. Since we do not want the size of the data points to be scaled according to a column in new_metadata, we can just specify a number for this argument.
# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))# Add a scatterplot layer to the plot, coloring points by genotypesns.scatterplot(data = new_metadata, x="age_in_days", y="mean_expression", hue="genotype", style="celltype", s=50)
Figure 6: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype with adjusted size.
Themes
There are a variety of themes that you can apply to your plot to change the background and gridlines. The default theme is darkgrid, but you can change it with the set_style() function from seaborn.
# Set the theme to "whitegrid"sns.set_style("whitegrid")# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))# Add a scatterplot layer to the plot, coloring points by genotypesns.scatterplot(data = new_metadata, x="age_in_days", y="mean_expression", hue="genotype", style="celltype", s=50)
Figure 7: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype with adjusted size and a different theme.
Customizing themes
You can also customize these further with rc_params if you want to adjust specific elements of the theme. The documentation for set_style() can be found on their official website.
Changing Labels
Do the axis labels or the tick labels get any larger by changing themes?
No, they don’t. But we can change both the x-axis labels and size labels with the plt.xlabel() functions from matplotlib. Since we will be adding this layer “on top”, or after theme_bw(), any features we change will override what is set by the sns.set_style() layer.
Let’s increase the size of the x-axis titles to be 10.
# Set the theme to "whitegrid"sns.set_style("whitegrid")# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))# Add a scatterplot layer to the plot, coloring points by genotypesns.scatterplot(data = new_metadata, x="age_in_days", y="mean_expression", hue="genotype", style="celltype", s=50)# Change the size of the axis labelsplt.xlabel("Age in Days", fontsize=20)
Figure 8: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype with adjusted size, a different theme, and larger axis labels.
The current axis label text defaults to what we gave as input to geom_point (i.e the column headers). We can change this by adding additional layers called xlabel() and ylabel() for the x- and y-axis, respectively. Add these layers to the current plot such that the x-axis is labeled “Age (days)” and the y-axis is labeled “Mean expression”.
Use the plt.title() layer to add a plot title of your choice.
When you add the arguments loc="center" to the plt.title() function. What does it change?
Try adding the layer plt.legend(loc="center right") to the end of your code. What does this do? How many layers can be added to a plot, in your estimation?
---title: "Plotting Basics with `Matplotlib` and `Seaborn`"description: | This lesson introduces data visualization in Python using Matplotlib and Seaborn, showing how to build scatterplots, adjust aesthetics, and customize labels to create clear figures.author: - Noor Sohail - Will Gammerdingerdate: "2026-03-16"categories: - Python programming - Data visualization - Matplotlib - Seabornkeywords: - Figures - Scatterplots - Aesthetics - Themes - Plot customizationlicense: "CC-BY-4.0"editor_options: markdown: wrap: 72---```{r}#| label: load_libraries_data#| echo: false# Load libraries and data# Interfacing with R quarto and python futzinglibrary(reticulate)use_condaenv("/opt/anaconda3/envs/intro_python", required =TRUE)```Approximate time: XX minutes## Learning Objectives In this lesson, we will:- Explain the concept of layering in plotting and how to build a plot step by step.- Create a scatterplot using `MatPlotLib` and customize its aesthetics with `Seaborn`.- Apply different themes to a plot and adjust axis labels and titles.## Overview of lessonPlots are one of the best ways to communication and summarize results to others. With `Matplotlib` and `Seaborn`, you can create customizable visualizations from your data. Data scientists and researchers use these tools everyday to explore trends and create publication-ready figures for presentations and manuscripts. In this lesson, you will learn the basics of building a scatterplot and adjusting aesthetics to give you the foundation for creating any plot you may want to generate in the future.## Plotting Basics`MatPlotLib` is one of the most widely used plotting packages in Python. With it, we can create many different types of plots, including scatterplots, line plots, bar plots, boxplots, and more. The important thing to remember is that you can slowly build upon your plot, adding different layers of information to create a more informative and visually appealing plot. So there is no need to create a perfect plot in one step!We will start with drawing a simple x-y scatterplot of `mean_expression` versus `age_in_days` from `new_metadata`.### Initialize a Plot with `MatPlotLib`First, we will import the `MatPlotLib` library and load `new_metadata` that we created in the previous lesson:```{python}#| label: tbl-load_new_metadata#| tbl-cap: DataFrame containing updated metadata for each of our 12 samples.import matplotlib.pyplot as pltimport pandas as pd # Load the new metadata data frame that we created in the previous lessonnew_metadata = pd.read_csv("data/new_metadata.csv", index_col=0) new_metadata```We will first initialize a plot using the `figure()` function from `MatPlotLib`. Let us look at some of the arguments we can use with the the `help()` function:```{python}#| label: help_figure#| eval: false# Look at the help for the figure functionhelp(plt.figure)```So first, let's create an empty plot of size 8 inches by 6 inches:```{python}#| label: fig-initialize_plot#| fig-cap: An empty plot initialized with `MatPlotLib`.# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))```### Adding a Scatterplot LayerWe are once again to going to first initialize the plot with `figure()` and then add the scatterplot layer with `scatter()`. Next, we add a scatterplot layer using the `scatter()` function. We have to specify the `x` and `y` values for our scatterplot, which will be `age_in_days` and `mean_expression`, respectively. The `plt.figure` and `plt.scatter` calls are connected because `MatPlotLib` commands build upon each other to. The `plt.figure` command initializes the plot and sets the size, while the `plt.scatter` command adds the scatterplot layer to the existing plot. By calling these functions sequentially, we can build up our plot layer by layer.```{python}#| label: fig-add_scatterplot_layer#| fig-cap: Scatterplot of age in days vs. mean expression.# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))# Add a scatterplot layer to the plotplt.scatter(x=new_metadata["age_in_days"], y=new_metadata["mean_expression"])```Now that we have the required fundamentals, let’s add some extras like color to the plot. We can color the points on the plot based on the genotype column with the `c` argument. ```{python}#| label: fig-add_color_scatterplot_layer_error#| fig-cap: Initial attempt to color the scatterplot of age in days vs. mean expression by genotype, which results in an error.#| error: true# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))# Add a scatterplot layer to the plot, coloring points by genotypeplt.scatter(data = new_metadata, x="age_in_days", y="mean_expression", c="genotype")```**We are getting an error from trying to set the color.** This is because the `c` argument in `scatter()` expects a list of color values, but we are providing it with categorical data from the `genotype` column. ### Changing AestheticsTo get around the error from `plt.scatter`, we can instead use the `seaborn` package's `scatterplot()` function and use the `hue` argument instead of `c`, which allows us to specify a categorical variable for coloring the points. The documentation for `seaborn.scatterplot()` is quite extensive and can be found [on their official website](https://seaborn.pydata.org/generated/seaborn.scatterplot.html).You will notice that there are a default set of colors that will be used so we do not have to specify. The legend has also been plotted for us automatically!```{python}#| label: fig-add_color_scatterplot_layer#| fig-cap: Scatterplot of age in days vs. mean expression, colored by genotype.import seaborn as sns# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))# Add a scatterplot layer to the plot, coloring points by genotypesns.scatterplot(data = new_metadata, x="age_in_days", y="mean_expression", hue="genotype")````seaborn` is a bit more flexible than `MatPlotLib` and allows us to easily add more aesthetics to our plot. Oftentimes you will find yourself using a blend of both packages together to create the plot you want.Let’s try to have both `celltype` and `genotype` represented on the plot. To do this we can assign the shape argument in `scatterplot()` the `celltype` column, so that each celltype is plotted with a different shaped data point.```{python}#| label: fig-add_shape_scatterplot_layer#| fig-cap: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype.# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))# Add a scatterplot layer to the plot, coloring points by genotypesns.scatterplot(data=new_metadata, x="age_in_days", y="mean_expression", hue="genotype", style="celltype")```The data points are quite small. We can adjust the `s` (size) of the data points within the `scatterplot()` function. Since we do not want the size of the data points to be scaled according to a column in `new_metadata`, we can just specify a number for this argument.```{python}#| label: fig-add_size_scatterplot_layer#| fig-cap: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype with adjusted size.# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))# Add a scatterplot layer to the plot, coloring points by genotypesns.scatterplot(data = new_metadata, x="age_in_days", y="mean_expression", hue="genotype", style="celltype", s=50)```### ThemesThere are a variety of themes that you can apply to your plot to change the background and gridlines. The default theme is `darkgrid`, but you can change it with the `set_style()` function from `seaborn`. ```{python}#| label: fig-change_theme#| fig-cap: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype with adjusted size and a different theme.# Set the theme to "whitegrid"sns.set_style("whitegrid")# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))# Add a scatterplot layer to the plot, coloring points by genotypesns.scatterplot(data = new_metadata, x="age_in_days", y="mean_expression", hue="genotype", style="celltype", s=50)```::: callout-note# Customizing themesYou can also customize these further with `rc_params` if you want to adjust specific elements of the theme. The documentation for `set_style()` can be found [on their official website](https://seaborn.pydata.org/generated/seaborn.set_style.html).:::### Changing LabelsDo the axis labels or the tick labels get any larger by changing themes?No, they don’t. But we can change both the x-axis labels and size labels with the `plt.xlabel()` functions from `matplotlib`. Since we will be adding this layer “on top”, or after theme_bw(), any features we change will override what is set by the `sns.set_style()` layer.Let’s increase the size of the x-axis titles to be 10.```{python}#| label: fig-change_axis_label_size#| fig-cap: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype with adjusted size, a different theme, and larger axis labels.# Set the theme to "whitegrid"sns.set_style("whitegrid")# Initialize a plot with a specific sizeplt.figure(figsize=(8, 6))# Add a scatterplot layer to the plot, coloring points by genotypesns.scatterplot(data = new_metadata, x="age_in_days", y="mean_expression", hue="genotype", style="celltype", s=50)# Change the size of the axis labelsplt.xlabel("Age in Days", fontsize=20)```:::{.callout-tip}# [**Exercise 1**](11_plotting_basics-Answer_key.qmd#exercise-1)1. The current axis label text defaults to what we gave as input to `geom_point` (i.e the column headers). We can change this by **adding additional layers** called `xlabel()` and `ylabel()` for the x- and y-axis, respectively. Add these layers to the current plot such that the x-axis is labeled "Age (days)" and the y-axis is labeled "Mean expression".2. Use the `plt.title()` layer to add a plot title of your choice. 3. When you add the arguments `loc="center"` to the `plt.title()` function. What does it change?4. Try adding the layer `plt.legend(loc="center right")` to the end of your code. What does this do? How many layers can be added to a plot, in your estimation?:::***[Next Lesson >>](12_boxplots.qmd)[Back to Schedule](../schedule/schedule.qmd)