Boxplots - Answer Key

Authors

Noor Sohail

Will Gammerdinger

Published

March 15, 2026

Exercise 1

Generate a boxplot using the data in the new_metadata dataframe. Create a code chunk with the following instructions:

  1. Use the sns.boxplot() function to plot the differences in sample means between the Wt and KO genotypes.
# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a boxplot layer to the plot, coloring boxes by celltype
sns.boxplot(data = new_metadata,
            x="genotype", 
            y="mean_expression")

# Render the plot
plt.show()

  1. Use the hue aesthetic to look at differences in sample means between the celltypes within each genotype.
# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a boxplot layer to the plot, coloring boxes by celltype
sns.boxplot(data = new_metadata,
            x="genotype", 
            y="mean_expression",
            hue="celltype")

# Render the plot
plt.show()

  1. Add a title to your plot.
# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a boxplot layer to the plot, coloring boxes by celltype
sns.boxplot(data = new_metadata,
            x="genotype", 
            y="mean_expression",
            hue="celltype")

# Add plot title
plt.title(label = "Genotype differences in average gene expression")

# Render the plot
plt.show()

  1. Add labels for the axes, “Genotype” for the x-axis and “Mean expression” for the y-axis.
# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a boxplot layer to the plot, coloring boxes by celltype
sns.boxplot(data = new_metadata,
            x="genotype", 
            y="mean_expression",
            hue="celltype")

# Add plot title
plt.title(label = "Genotype differences in average gene expression")

# Add x-axis label
plt.xlabel(xlabel = "Genotype")

# Add y-axis label
plt.ylabel(ylabel = "Mean expression")

# Render the plot
plt.show()

  1. Make the following aesthetic changes:
    • Use the white set_style to make the background white
    • Change the size of your axes labels to 15
    • Change the size of your plot title to 20
# Set the theme to "white"
sns.set_style(style = "white")

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a boxplot layer to the plot, coloring boxes by celltype
sns.boxplot(data = new_metadata,
            x="genotype", 
            y="mean_expression",
            hue="celltype")

# Add plot title
plt.title(label = "Genotype differences in average gene expression",
          fontsize = 20)

# Add x-axis label
plt.xlabel(xlabel = "Genotype",
           fontsize = 15)

# Add y-axis label
plt.ylabel(ylabel = "Mean expression", 
           fontsize = 15)

# Render the plot
plt.show()

Exercise 2

There is another argument we can use in sns.boxplot() called order that allows you to specify the order of the categories on the x-axis.

  1. Use the order argument to change the order of the genotypes on the x-axis such that KO is on the left and Wt is on the right.
# Set the theme to "white"
sns.set_style(style = "white")

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a boxplot layer to the plot, coloring boxes by celltype
sns.boxplot(data = new_metadata,
            x="genotype", 
            y="mean_expression",
            hue="celltype",
            order=["KO", "Wt"])

# Add plot title
plt.title(label = "Genotype differences in average gene expression",
          fontsize = 20)

# Add x-axis label
plt.xlabel(xlabel = "Genotype",
           fontsize = 15)

# Add y-axis label
plt.ylabel(ylabel = "Mean expression", 
           fontsize = 15)

# Render the plot
plt.show()

Exercise 3

  1. Specify the argument palette=["purple","orange"] in your sns.boxplot() code chunk to change the colors of the boxes.
# Set the theme to "white"
sns.set_style(style = "white")

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a boxplot layer to the plot, coloring boxes by celltype
sns.boxplot(data = new_metadata,
            x="genotype", 
            y="mean_expression",
            hue="celltype",
            order=["KO", "Wt"],
            palette=["purple", "orange"])

# Add plot title
plt.title(label = "Genotype differences in average gene expression",
          fontsize = 20)

# Add x-axis label
plt.xlabel(xlabel = "Genotype",
           fontsize = 15)

# Add y-axis label
plt.ylabel(ylabel = "Mean expression", 
           fontsize = 15)

# Render the plot
plt.show()

  1. Back in your boxplot code, change the colors in the palette argument to be your 2 favorite colors. Are there any colors that you tried that did not work? You can try to find some named colors here if you want to explore more colors.
# Set the theme to "white"
sns.set_style(style = "white")

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a boxplot layer to the plot, coloring boxes by celltype
sns.boxplot(data = new_metadata,
            x="genotype", 
            y="mean_expression",
            hue="celltype",
            order=["KO", "Wt"],
            palette=["cornflowerblue", "orange"])

# Add plot title
plt.title(label = "Genotype differences in average gene expression",
          fontsize = 20)

# Add x-axis label
plt.xlabel(xlabel = "Genotype",
           fontsize = 15)

# Add y-axis label
plt.ylabel(ylabel = "Mean expression", 
           fontsize = 15)

# Render the plot
plt.show()

cornflowerblue is a great color in both R and Python and it exists in both. It also pairs well with orange which also exists in both.

Exercise 4

  1. Find the hexadecimal code for your 2 favorite colors (from the previous exercise) and replace the color names with the hexadecimal codes within the ggplot2 code chunk.
# Set the theme to "white"
sns.set_style(style = "white")

# Initialize a plot with a specific size
plt.figure(figsize = (8, 6))

# Add a boxplot layer to the plot, coloring boxes by celltype
sns.boxplot(data = new_metadata,
            x="genotype", 
            y="mean_expression",
            hue="celltype",
            order=["KO", "Wt"],
            palette=["#6495ED", "#FFA500"])

# Add plot title
plt.title(label = "Genotype differences in average gene expression",
          fontsize = 20)

# Add x-axis label
plt.xlabel(xlabel = "Genotype",
           fontsize = 15)

# Add y-axis label
plt.ylabel(ylabel = "Mean expression", 
           fontsize = 15)

# Render the plot
plt.show()

Reuse

CC-BY-4.0