Plotting Basics with Matplotlib and Seaborn

Python programming
Data visualization
Matplotlib
Seaborn

This lesson introduces data visualization in Python using Matplotlib and Seaborn, showing how to build scatterplots, adjust aesthetics, and customize labels to create clear figures.

Authors

Noor Sohail

Will Gammerdinger

Published

March 16, 2026

Keywords

Figures, Scatterplots, Aesthetics, Themes, Plot customization

Approximate time: XX minutes

Learning Objectives

In this lesson, we will:

  • Explain the concept of layering in plotting and how to build a plot step by step.
  • Create a scatterplot using MatPlotLib and customize its aesthetics with Seaborn.
  • Apply different themes to a plot and adjust axis labels and titles.

Overview of lesson

Plots are one of the best ways to communication and summarize results to others. With Matplotlib and Seaborn, you can create customizable visualizations from your data. Data scientists and researchers use these tools everyday to explore trends and create publication-ready figures for presentations and manuscripts. In this lesson, you will learn the basics of building a scatterplot and adjusting aesthetics to give you the foundation for creating any plot you may want to generate in the future.

Plotting Basics

MatPlotLib is one of the most widely used plotting packages in Python. With it, we can create many different types of plots, including scatterplots, line plots, bar plots, boxplots, and more. The important thing to remember is that you can slowly build upon your plot, adding different layers of information to create a more informative and visually appealing plot. So there is no need to create a perfect plot in one step!

We will start with drawing a simple x-y scatterplot of mean_expression versus age_in_days from new_metadata.

Initialize a Plot with MatPlotLib

First, we will import the MatPlotLib library and load new_metadata that we created in the previous lesson:

import matplotlib.pyplot as plt
import pandas as pd 

# Load the new metadata data frame that we created in the previous lesson
new_metadata = pd.read_csv("data/new_metadata.csv", index_col=0)  
new_metadata
Table 1: DataFrame containing updated metadata for each of our 12 samples.
genotype celltype replicate mean_expression age_in_days
sample1 Wt typeA 1 10.266102 40
sample2 Wt typeA 2 10.849759 32
sample3 Wt typeA 3 9.452517 38
sample4 KO typeA 1 15.833872 35
sample5 KO typeA 2 15.590184 41
sample6 KO typeA 3 15.551529 32
sample7 Wt typeB 1 15.522219 34
sample8 Wt typeB 2 13.808281 26
sample9 Wt typeB 3 14.108399 28
sample10 KO typeB 1 10.743292 28
sample11 KO typeB 2 10.778318 30
sample12 KO typeB 3 9.754733 32

We will first initialize a plot using the figure() function from MatPlotLib. Let us look at some of the arguments we can use with the the help() function:

# Look at the help for the figure function
help(plt.figure)

So first, let’s create an empty plot of size 8 inches by 6 inches:

# Initialize a plot with a specific size
plt.figure(figsize=(8, 6))
Figure 1: An empty plot initialized with MatPlotLib.

Adding a Scatterplot Layer

We are once again to going to first initialize the plot with figure() and then add the scatterplot layer with scatter(). Next, we add a scatterplot layer using the scatter() function. We have to specify the x and y values for our scatterplot, which will be age_in_days and mean_expression, respectively.

The plt.figure and plt.scatter calls are connected because MatPlotLib commands build upon each other to. The plt.figure command initializes the plot and sets the size, while the plt.scatter command adds the scatterplot layer to the existing plot. By calling these functions sequentially, we can build up our plot layer by layer.

# Initialize a plot with a specific size
plt.figure(figsize=(8, 6))

# Add a scatterplot layer to the plot
plt.scatter(x=new_metadata["age_in_days"], 
            y=new_metadata["mean_expression"])
Figure 2: Scatterplot of age in days vs. mean expression.

Now that we have the required fundamentals, let’s add some extras like color to the plot. We can color the points on the plot based on the genotype column with the c argument.

# Initialize a plot with a specific size
plt.figure(figsize=(8, 6))

# Add a scatterplot layer to the plot, coloring points by genotype
plt.scatter(data = new_metadata,
            x="age_in_days", 
            y="mean_expression",
            c="genotype")
ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not sample1     Wt
sample2     Wt
sample3     Wt
sample4     KO
sample5     KO
sample6     KO
sample7     Wt
sample8     Wt
sample9     Wt
sample10    KO
sample11    KO
sample12    KO
Name: genotype, dtype: str
Figure 3: Initial attempt to color the scatterplot of age in days vs. mean expression by genotype, which results in an error.

We are getting an error from trying to set the color. This is because the c argument in scatter() expects a list of color values, but we are providing it with categorical data from the genotype column.

Changing Aesthetics

To get around the error from plt.scatter, we can instead use the seaborn package’s scatterplot() function and use the hue argument instead of c, which allows us to specify a categorical variable for coloring the points. The documentation for seaborn.scatterplot() is quite extensive and can be found on their official website.

You will notice that there are a default set of colors that will be used so we do not have to specify. The legend has also been plotted for us automatically!

import seaborn as sns
# Initialize a plot with a specific size
plt.figure(figsize=(8, 6))

# Add a scatterplot layer to the plot, coloring points by genotype
sns.scatterplot(data = new_metadata,
                x="age_in_days", 
                y="mean_expression",
                hue="genotype")
Figure 4: Scatterplot of age in days vs. mean expression, colored by genotype.

seaborn is a bit more flexible than MatPlotLib and allows us to easily add more aesthetics to our plot. Oftentimes you will find yourself using a blend of both packages together to create the plot you want.

Let’s try to have both celltype and genotype represented on the plot. To do this we can assign the shape argument in scatterplot() the celltype column, so that each celltype is plotted with a different shaped data point.

# Initialize a plot with a specific size
plt.figure(figsize=(8, 6))

# Add a scatterplot layer to the plot, coloring points by genotype
sns.scatterplot(data=new_metadata,
                x="age_in_days", 
                y="mean_expression",
                hue="genotype",
                style="celltype")
Figure 5: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype.

The data points are quite small. We can adjust the s (size) of the data points within the scatterplot() function. Since we do not want the size of the data points to be scaled according to a column in new_metadata, we can just specify a number for this argument.

# Initialize a plot with a specific size
plt.figure(figsize=(8, 6))

# Add a scatterplot layer to the plot, coloring points by genotype
sns.scatterplot(data = new_metadata,
                x="age_in_days", 
                y="mean_expression",
                hue="genotype",
                style="celltype",
                s=50)
Figure 6: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype with adjusted size.

Themes

There are a variety of themes that you can apply to your plot to change the background and gridlines. The default theme is darkgrid, but you can change it with the set_style() function from seaborn.

# Set the theme to "whitegrid"
sns.set_style("whitegrid")

# Initialize a plot with a specific size
plt.figure(figsize=(8, 6))

# Add a scatterplot layer to the plot, coloring points by genotype
sns.scatterplot(data = new_metadata,
                x="age_in_days", 
                y="mean_expression",
                hue="genotype",
                style="celltype",
                s=50)
Figure 7: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype with adjusted size and a different theme.
Customizing themes

You can also customize these further with rc_params if you want to adjust specific elements of the theme. The documentation for set_style() can be found on their official website.

Changing Labels

Do the axis labels or the tick labels get any larger by changing themes?

No, they don’t. But we can change both the x-axis labels and size labels with the plt.xlabel() functions from matplotlib. Since we will be adding this layer “on top”, or after theme_bw(), any features we change will override what is set by the sns.set_style() layer.

Let’s increase the size of the x-axis titles to be 10.

# Set the theme to "whitegrid"
sns.set_style("whitegrid")

# Initialize a plot with a specific size
plt.figure(figsize=(8, 6))

# Add a scatterplot layer to the plot, coloring points by genotype
sns.scatterplot(data = new_metadata,
                x="age_in_days", 
                y="mean_expression",
                hue="genotype",
                style="celltype",
                s=50)

# Change the size of the axis labels
plt.xlabel("Age in Days", fontsize=20)
Figure 8: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype with adjusted size, a different theme, and larger axis labels.
  1. The current axis label text defaults to what we gave as input to geom_point (i.e the column headers). We can change this by adding additional layers called xlabel() and ylabel() for the x- and y-axis, respectively. Add these layers to the current plot such that the x-axis is labeled “Age (days)” and the y-axis is labeled “Mean expression”.

  2. Use the plt.title() layer to add a plot title of your choice.

  3. When you add the arguments loc="center" to the plt.title() function. What does it change?

  4. Try adding the layer plt.legend(loc="center right") to the end of your code. What does this do? How many layers can be added to a plot, in your estimation?


Next Lesson >>

Back to Schedule

Reuse

CC-BY-4.0