Boxplots

Python programming
Data visualization
Boxplots
Matplotlib
Seaborn

This lesson shows how to create and customize boxplots in Python with Matplotlib and Seaborn to visualize distributions, identify outliers, and adjust colors and styles.

Authors

Noor Sohail

Will Gammerdinger

Published

March 16, 2026

Keywords

Aesthetics, Plot styling, Hex colors

Approximate time: XX minutes

Learning Objectives

In this lesson, we will:

  • Generate a boxplot with Python
  • Customize the aesthetics of a boxplot
  • Find hexadecimal codes for colors and use them to change the colors of a boxplot

Overview of lesson

Boxplots are a great way to visualize the distribution of your data when you are comparing different groups, such as expression levels across conditions. They can help you identify differences between conditions while also representing outliers and variability within your data. Using Matplotlib and Seaborn, we are going to build upon the foundation of plotting to create even more custom plots. In this lesson, you will practice creating boxplots and modifying their appearance to your liking.

Boxplots

A boxplot provides a graphical view of the distribution of data based on a five number summary:

  • The top and bottom of the box represent the (1) first and (2) third quartiles (25th and 75th percentiles, respectively).
  • The line inside the box represents the (3) median (50th percentile).
  • The whiskers extending above and below the box represent the (4) maximum, and (5) minimum of a data set.
  • The whiskers of the plot reach the minimum and maximum values that are not outliers.
Outliers in boxplots

In this case, outliers are determined using the interquartile range (IQR), which is defined as: Q3 - Q1. Any values that exceeds 1.5 x IQR below Q1 or above Q3 are considered outliers and are represented as points above or below the whiskers.

Generate a boxplot using the data in the new_metadata dataframe. Create a ggplot2 code chunk with the following instructions:

  1. Use the sns.boxplot() function to plot the differences in sample means between the Wt and KO genotypes.

  2. Use the hue aesthetic to look at differences in sample means between the celltypes within each genotype.

  3. Add a title to your plot.

  4. Add labels, “Genotype” for the x-axis and “Mean expression” for the y-axis.

  5. Make the following theme() changes:

    • Use the white theme to make the background white.
    • Change the size of your axes labels to 15.
    • Change the size of your plot title to 20.
    • Center the plot title.
Warning

Final plots are embedded here, need to create the answer key.

Also figure captions need to be cleaned up.

By the end, you should have a plot that looks something like this:

Figure 1: Boxplot of mean expression by genotype and celltype.

Changing the Order of Genotypes in the Boxplot

Let’s say you wanted to have the KO boxplots displayed first on the left side, and Wt on the right. How might you go about doing this?

Luckily there is another argument in sns.boxplot() called order that allows you to specify the order of the categories on the x-axis.

  1. Use the order argument to change the order of the genotypes on the x-axis such that KO is on the left and Wt is on the right.
Figure 2: Boxplot of mean expression by genotype and celltype, x-axis re-ordered.

Changing default colors

So far, you may have noticed that Python has been using some default colors for the points and boxes in our plots. You can change these default colors by using the palette argument in sns.boxplot() and sns.scatterplot().

You can color the boxplot differently by using some specific layers:

  1. Specify the argument palette=["purple","orange"] in your sns.boxplot() code chunk to change the colors of the boxes.

  2. Back in your boxplot code, change the colors in the palette argument to be your 2 favorite colors. Are there any colors that you tried that did not work?

Figure 3: Boxplot of mean expression by genotype and celltype, x-axis re-ordered.

You are not restricted to using colors by writing them out as character vectors. You have the choice of a lot of colors in Python, and you can do so by using their hexadecimal code. For example, “#FF0000” would be red and “#00FF00” would be green similarly, “#FFFFFF” would be white and “#000000” would be black. Here is a website that you can use to find the hexadecimal code for your favorite colors.

  1. Find the hexadecimal code for your 2 favorite colors (from the final exercise) and replace the color names with the hexadecimal codes within the ggplot2 code chunk.
Figure 4: Boxplot of mean expression by genotype and celltype, x-axis re-ordered.

Next Lesson >>

Back to Schedule

Reuse

CC-BY-4.0