Plotting Basics with Matplotlib and Seaborn

Python programming
Data visualization
Matplotlib
Seaborn

This lesson introduces data visualization in Python using Matplotlib and Seaborn, showing how to build scatterplots, adjust aesthetics and customize labels to create clear figures.

Authors

Noor Sohail

Will Gammerdinger

Published

March 16, 2026

Keywords

Figures, Scatterplots, Aesthetics, Themes, Plot customization

Approximate time: 40 minutes

Learning objectives

In this lesson, we will:

  • Explain the concept of layering in plotting and how to build a plot step-by-step
  • Create a scatterplot using MatPlotLib and customize its aesthetics with Seaborn
  • Apply different themes to a plot and adjust axis labels and titles

Overview of lesson

Plots are one of the best ways to communication and summarize results to others. With Matplotlib and Seaborn, you can create customizable visualizations from your data. Data scientists and researchers use these tools everyday to explore trends and create publication-ready figures for presentations and manuscripts. In this lesson, you will learn the basics of building a scatterplot and adjusting its aesthetics to give you the foundation for creating any plot you may want to generate in the future.

Plotting basics

MatPlotLib is one of the most widely used plotting packages in Python. With it, we can create many different types of plots, including scatterplots, line plots, bar plots, boxplots and more. The important thing to remember is that you can slowly build upon your plot, adding different layers of information to create a more informative and visually appealing plot. So there is no need to create a perfect plot in one step!

We will start with drawing a simple x-y scatterplot of mean_expression versus age_in_days from new_metadata.

Initialize a plot with MatPlotLib

First, we will import the MatPlotLib and Pandas libraries as well as load new_metadata that we created in the previous lesson:

# Import libraries
import matplotlib.pyplot as plt
import pandas as pd 

# Load the new metadata data frame that we created in the previous lesson
new_metadata = pd.read_csv("data/new_metadata.csv", index_col=0)  

# Print out new_metadata
new_metadata
Table 1: DataFrame containing updated metadata for each of our 12 samples.
genotype celltype replicate mean_expression age_in_days
sample1 Wt typeA 1 10.266102 40
sample2 Wt typeA 2 10.849759 32
sample3 Wt typeA 3 9.452517 38
sample4 KO typeA 1 15.833872 35
sample5 KO typeA 2 15.590184 41
sample6 KO typeA 3 15.551529 32
sample7 Wt typeB 1 15.522219 34
sample8 Wt typeB 2 13.808281 26
sample9 Wt typeB 3 14.108399 28
sample10 KO typeB 1 10.743292 28
sample11 KO typeB 2 10.778318 30
sample12 KO typeB 3 9.754733 32

We will first initialize a plot by using the figure() function from MatPlotLib. Let us look at some of the arguments we can use with the help() function:

# Look at the help for the figure function
help(plt.figure)

So first, we’ll create an empty plot of size 8 inches by 6 inches:

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Render the plot
plt.show()
<Figure size 768x576 with 0 Axes>
Figure 1: An empty plot initialized with MatPlotLib.
Note

As we go through the lesson, you may notice that most plots will render just fine without plt.show(). plt.show() is a good practice to have in your plot’s code to explicitly state that you would like the figure to be created at this point. Certain computing set-ups require plt.show() in order to render an image, while others may get confused and attempt to overlay multiple plots without it. As a result, it is a good habit to use plt.show() to explicitly state where in the code you’d like your plot to be rendered.

Adding a scatterplot layer

We will once again first initialize the plot with figure() and then add the scatterplot layer to the plot with scatter(). We need to specify where we are pulling the data to plot from, which in this case will be the new_metadata DataFrame, and the x and y values for our scatterplot, which in this case will be age_in_days for x and mean_expression for y.

The plt.figure and plt.scatter calls are connected because MatPlotLib commands build upon each other. The plt.figure command initializes the plot and sets the size, while the plt.scatter command adds the scatterplot layer to the existing plot. When we call these functions sequentially, we are building our plot layer by layer.

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a scatterplot layer to the plot
plt.scatter(data = new_metadata,
            x = "age_in_days", 
            y = "mean_expression")

# Render the plot
plt.show()
Figure 2: Scatterplot of age in days vs. mean expression.
Alternative

Instead of providing the data argument, you could specify the x and y axes as the given columns from new_metadata.

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a scatterplot layer to the plot
plt.scatter(x = new_metadata["age_in_days"], 
            y = new_metadata["mean_expression"])
            
# Render the plot
plt.show()

Now that we have the required fundamentals, let’s add some extra details like color to the plot. We can color the points on the plot based on the genotype column with the c argument.

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a scatterplot layer to the plot, coloring points by genotype
plt.scatter(data = new_metadata,
            x = "age_in_days", 
            y = "mean_expression",
            c = "genotype")

# Render the plot
plt.show()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /opt/anaconda3/lib/python3.13/site-packages/matplotlib/axes/_axes.py:4761, in Axes._parse_scatter_color_args(c, edgecolors, kwargs, xsize, get_next_color_func)
   4760 try:  # Is 'c' acceptable as PathCollection facecolors?
-> 4761     colors = mcolors.to_rgba_array(c)
   4762 except (TypeError, ValueError) as err:

File /opt/anaconda3/lib/python3.13/site-packages/matplotlib/colors.py:515, in to_rgba_array(c, alpha)
    514 else:
--> 515     rgba = np.array([to_rgba(cc) for cc in c])
    517 if alpha is not None:

File /opt/anaconda3/lib/python3.13/site-packages/matplotlib/colors.py:317, in to_rgba(c, alpha)
    316 if rgba is None:  # Suppress exception chaining of cache lookup failure.
--> 317     rgba = _to_rgba_no_colorcycle(c, alpha)
    318     try:

File /opt/anaconda3/lib/python3.13/site-packages/matplotlib/colors.py:394, in _to_rgba_no_colorcycle(c, alpha)
    393         return c, c, c, alpha if alpha is not None else 1.
--> 394     raise ValueError(f"Invalid RGBA argument: {orig_c!r}")
    395 # turn 2-D array into 1-D array

ValueError: Invalid RGBA argument: 'Wt'

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
Cell In[4], line 5
      2 plt.figure(figsize = (8, 6))
      4 # Add a scatterplot layer to the plot, coloring points by genotype
----> 5 plt.scatter(data = new_metadata,
      6             x = "age_in_days", 
      7             y = "mean_expression",
      8             c = "genotype")
     10 # Render the plot
     11 plt.show()

File /opt/anaconda3/lib/python3.13/site-packages/matplotlib/_api/deprecation.py:453, in make_keyword_only.<locals>.wrapper(*args, **kwargs)
    447 if len(args) > name_idx:
    448     warn_deprecated(
    449         since, message="Passing the %(name)s %(obj_type)s "
    450         "positionally is deprecated since Matplotlib %(since)s; the "
    451         "parameter will become keyword-only in %(removal)s.",
    452         name=name, obj_type=f"parameter of {func.__name__}()")
--> 453 return func(*args, **kwargs)

File /opt/anaconda3/lib/python3.13/site-packages/matplotlib/pyplot.py:3948, in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, edgecolors, colorizer, plotnonfinite, data, **kwargs)
   3928 @_copy_docstring_and_deprecators(Axes.scatter)
   3929 def scatter(
   3930     x: float | ArrayLike,
   (...)   3946     **kwargs,
   3947 ) -> PathCollection:
-> 3948     __ret = gca().scatter(
   3949         x,
   3950         y,
   3951         s=s,
   3952         c=c,
   3953         marker=marker,
   3954         cmap=cmap,
   3955         norm=norm,
   3956         vmin=vmin,
   3957         vmax=vmax,
   3958         alpha=alpha,
   3959         linewidths=linewidths,
   3960         edgecolors=edgecolors,
   3961         colorizer=colorizer,
   3962         plotnonfinite=plotnonfinite,
   3963         **({"data": data} if data is not None else {}),
   3964         **kwargs,
   3965     )
   3966     sci(__ret)
   3967     return __ret

File /opt/anaconda3/lib/python3.13/site-packages/matplotlib/_api/deprecation.py:453, in make_keyword_only.<locals>.wrapper(*args, **kwargs)
    447 if len(args) > name_idx:
    448     warn_deprecated(
    449         since, message="Passing the %(name)s %(obj_type)s "
    450         "positionally is deprecated since Matplotlib %(since)s; the "
    451         "parameter will become keyword-only in %(removal)s.",
    452         name=name, obj_type=f"parameter of {func.__name__}()")
--> 453 return func(*args, **kwargs)

File /opt/anaconda3/lib/python3.13/site-packages/matplotlib/__init__.py:1553, in _preprocess_data.<locals>.inner(ax, data, *args, **kwargs)
   1549 if label_namer and "label" not in args_and_kwargs:
   1550     new_kwargs["label"] = _label_from_arg(
   1551         args_and_kwargs.get(label_namer), auto_label)
-> 1553 return func(*new_args, **new_kwargs)

File /opt/anaconda3/lib/python3.13/site-packages/matplotlib/axes/_axes.py:4954, in Axes.scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, edgecolors, colorizer, plotnonfinite, **kwargs)
   4951 if edgecolors is None:
   4952     orig_edgecolor = kwargs.get('edgecolor', None)
   4953 c, colors, edgecolors = \
-> 4954     self._parse_scatter_color_args(
   4955         c, edgecolors, kwargs, x.size,
   4956         get_next_color_func=self._get_patches_for_fill.get_next_color)
   4958 if plotnonfinite and colors is None:
   4959     c = np.ma.masked_invalid(c)

File /opt/anaconda3/lib/python3.13/site-packages/matplotlib/axes/_axes.py:4770, in Axes._parse_scatter_color_args(c, edgecolors, kwargs, xsize, get_next_color_func)
   4767             raise invalid_shape_exception(c.size, xsize) from err
   4768         # Both the mapping *and* the RGBA conversion failed: pretty
   4769         # severe failure => one may appreciate a verbose feedback.
-> 4770         raise ValueError(
   4771             f"'c' argument must be a color, a sequence of colors, "
   4772             f"or a sequence of numbers, not {c!r}") from err
   4773 else:
   4774     if len(colors) not in (0, 1, xsize):
   4775         # NB: remember that a single color is also acceptable.
   4776         # Besides *colors* will be an empty array if c == 'none'.

ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not sample1     Wt
sample2     Wt
sample3     Wt
sample4     KO
sample5     KO
sample6     KO
sample7     Wt
sample8     Wt
sample9     Wt
sample10    KO
sample11    KO
sample12    KO
Name: genotype, dtype: object
Figure 3: Initial attempt to color the scatterplot of age in days vs. mean expression by genotype, which results in an error.

We are getting an error from trying to set the color. This is because the c argument in scatter() expects a list of color values, but we are providing it with categorical data from the genotype column.

Changing aesthetics

To work around the error from plt.scatter, we will instead use the seaborn package’s scatterplot() function and use the hue argument instead of c, which allows us to specify a categorical variable in order to color the plot points. The documentation for seaborn.scatterplot() is quite extensive and can be found on their official website.

You will notice that there are a default set of colors that we can use, so we do not have to specify a color. The legend and axis labels have also been automatically plotted for us!

# Import library
import seaborn as sns

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a scatterplot layer to the plot, coloring points by genotype
sns.scatterplot(data = new_metadata,
                x = "age_in_days", 
                y = "mean_expression",
                hue = "genotype")

# Render the plot
plt.show()
Figure 4: Scatterplot of age in days vs. mean expression, colored by genotype.

seaborn is a bit more flexible than MatPlotLib and allows us to easily add more aesthetics to our plot. You will oftentimes find yourself using a blend of both packages together to create the plot you want.

Let’s try to have both celltype and genotype represented on the plot. We can assign the celltype column to the style argument in scatterplot(), so each celltype is plotted with a different shaped data point.

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a scatterplot layer to the plot, coloring points by genotype
sns.scatterplot(data = new_metadata,
                x = "age_in_days", 
                y = "mean_expression",
                hue = "genotype",
                style = "celltype")

# Render the plot
plt.show()
Figure 5: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype.
Note

You may have noticed that the figure legend moved when we added our style argument. This is because there is a argument (loc) set within plt.legend() which allows you to direct the placement of the legend. It can take the value of one of nine possible locations ('upper left', 'upper right', 'lower left', 'lower right', 'upper center', 'lower center', 'center left', 'center right', 'center') to determine where to place the legend. However, the default is a value of 'best' which selects one of those nine possible locations, which minimizes the overlap of the legend on top of data. As a result of adding our style argument, it made the legend longer, which meant that it would be better placed is a different location in the plot to minimize overlap with the data points.

The data points are quite small. We can also adjust the s (size) of the data points within the scatterplot() function. Since we do not want the size of the data points to be scaled according to a column in new_metadata, we will just specify a number for this argument.

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a scatterplot layer to the plot, coloring points by genotype
sns.scatterplot(data = new_metadata,
                x = "age_in_days", 
                y = "mean_expression",
                hue = "genotype",
                style = "celltype",
                s = 50)

# Render the plot
plt.show()
Figure 6: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype, with adjusted size.

Themes

There are a variety of themes that you can apply to your plot to change the background and gridlines. The default theme is darkgrid, but you can change it with the set_style() function from seaborn.

# Set the theme to "whitegrid"
sns.set_style(style = "whitegrid")

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a scatterplot layer to the plot, coloring points by genotype
sns.scatterplot(data = new_metadata,
                x = "age_in_days", 
                y = "mean_expression",
                hue = "genotype",
                style = "celltype",
                s = 50)

# Render the plot
plt.show()
Figure 7: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype, with adjusted size and a different theme.
Customizing themes

You can also customize themes further with rc_params when you want to adjust specific elements of the theme. The documentation for set_style() can be found on their official website.

Changing labels

The axis labels and tick labels don’t get any larger by changing themes. We can, however, change both the x-axis labels and size labels with the plt.xlabel() functions from matplotlib. Since we will be adding this layer “on top” of, or after, sns.set_style(), any features we change will override what is set by the sns.set_style() layer.

Let’s increase the size of the x-axis title to be 20.

# Set the theme to "whitegrid"
sns.set_style(style = "whitegrid")

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a scatterplot layer to the plot, coloring points by genotype
sns.scatterplot(data = new_metadata,
                x = "age_in_days", 
                y = "mean_expression",
                hue = "genotype",
                style = "celltype",
                s = 50)

# Change the size and text of the axis label
plt.xlabel(xlabel = "Age in Days",
           fontsize = 20)

# Render the plot
plt.show()
Figure 8: Scatterplot of age in days vs. mean expression, colored by genotype and shaped by celltype, with adjusted size, a different theme and larger x-axis title.

Saving plots

If you wanted to save this plot, you can use the savefig() function from matplotlib and specify the file name and format you want to save it in. By default, this function will save the last plot that was generated in a given code block, so make sure to call savefig() after you have generated the plot you want to save in the same codeblock. For example, to save the plot as a PNG file, you can use:

# Set the theme to "whitegrid"
sns.set_style(style = "whitegrid")

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a scatterplot layer to the plot, coloring points by genotype
sns.scatterplot(data = new_metadata,
                x = "age_in_days", 
                y = "mean_expression",
                hue = "genotype",
                style = "celltype",
                s = 50)

# Change the size and text of the axis label
plt.xlabel(xlabel = "Age in Days",
           fontsize=20)

# Save the plot as a PNG file
plt.savefig(fname = "figures/scatterplot.png",
            format = "png")

If you wanted to specify the resolution (DPI) or the size of the saved figure, you can also include those arguments in the savefig() function. For example, to save the plot as a PNG file with a resolution of 300 DPI we can use:

# Set the theme to "whitegrid"
sns.set_style(style = "whitegrid")

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a scatterplot layer to the plot, coloring points by genotype
sns.scatterplot(data = new_metadata,
                x = "age_in_days", 
                y = "mean_expression",
                hue = "genotype",
                style = "celltype",
                s = 50)

# Change the size and text of the axis label
plt.xlabel(xlabel = "Age in Days",
           fontsize = 20)

# Save the plot as a PNG file with specific DPI and size
plt.savefig(fname = "figures/scatterplot_dpi.png",
            format = "png",
            dpi = 300)
  1. Add a plt.ylabel() layer to the current plot such that the y-axis is labeled “Mean expression”.

  2. Use the plt.title() layer to add a plot title of your choice.

  3. When you add the arguments loc="right" to the plt.title() function, what does it change?

  4. Let’s remove the loc = "right" argument from plt.title(). Try adding the layer plt.legend(loc = "center right") to the end of your code. What does this do? How many layers can be added to a plot, in your estimation?


Next Lesson >>

Back to Schedule

Reuse

CC-BY-4.0