Boxplot visualization Answer Key

Author

Will Gammerdinger

Published

July 1, 2025

Exercise 1

Generate a boxplot using the data in the new_metadata dataframe. Create a ggplot2 code chunk with the following instructions:

  1. Use the geom_boxplot() layer to plot the differences in sample means between the Wt and KO genotypes.
# Create a boxplot where the x-axis is the genotype and the y-axis are the samplemeans
ggplot(new_metadata) +
  geom_boxplot(aes(x = genotype, y = samplemeans))

  1. Use the fill aesthetic to look at differences in sample means between the celltypes within each genotype.
# Use celltype to provide fill for the boxplot
ggplot(new_metadata) +
  geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype))

  1. Add a title to your plot.
# Add a title to the plot
ggplot(new_metadata) +
  geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
  ggtitle("Genotype differences in average gene expression")

  1. Add labels, “Genotype” for the x-axis and “Mean expression” for the y-axis.
# Add axes labels 
ggplot(new_metadata) +
  geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
  ggtitle("Genotype differences in average gene expression") +
  xlab("Genotype") +
  ylab("Mean expression")

  1. Make the following theme() changes:
  • Use the theme_bw() function to make the background white.
# Utilize theme_bw() to make the background white
ggplot(new_metadata) +
  geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
  ggtitle("Genotype differences in average gene expression") +
  xlab("Genotype") +
  ylab("Mean expression") +
  theme_bw()

  • Change the size of your axes labels to 1.25x larger than the default.
# Increase the size of the axes labels
ggplot(new_metadata) +
  geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
  ggtitle("Genotype differences in average gene expression") +
  xlab("Genotype") +
  ylab("Mean expression") +
  theme_bw() +
  theme(axis.title = element_text(size = rel(1.25)))

  • Change the size of your plot title to 1.5x larger than default.
# Increase the size of the plot title
ggplot(new_metadata) +
  geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
  ggtitle("Genotype differences in average gene expression") +
  xlab("Genotype") +
  ylab("Mean expression") +
  theme_bw() +
  theme(axis.title = element_text(size = rel(1.25))) +
  theme(plot.title=element_text(size = rel(1.5)))

  • Center the plot title.
# Center the plot title
ggplot(new_metadata) +
  geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
  ggtitle("Genotype differences in average gene expression") +
  xlab("Genotype") +
  ylab("Mean expression") +
  theme_bw() +
  theme(axis.title = element_text(size = rel(1.25))) +
  theme(plot.title=element_text(size = rel(1.5))) +
  theme(plot.title=element_text(hjust = 0.5))

Exercise 2

Let’s say you wanted to have the “Wt” boxplots displayed first on the left side, and “KO” on the right. How might you go about doing this?

To do this, your first question should be - How does ggplot2 determine what to place where on the X-axis?

  • The order of the genotype on the X axis is in alphabetical order.
  • To change it, you need to make sure that the genotype column is a factor
  • And, the factor levels for that column are in the order you want on the x-axis
  1. Factor the new_metadata$genotype column without creating any extra variables/objects and change the levels to c("Wt", "KO")
# Convert the genotype column of the new_metadata data frame to a factor with the levels being "Wt" then "KO"
new_metadata$genotype <- factor(new_metadata$genotype, c("Wt", "KO"))
  1. Re-run the boxplot code chunk you created for the exercise above.
# Re-create the plot with the newly ordered x-axis
ggplot(new_metadata) +
  geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
  ggtitle("Genotype differences in average gene expression") +
  xlab("Genotype") +
  ylab("Mean expression") +
  theme_bw() +
  theme(axis.title = element_text(size = rel(1.25))) +
  theme(plot.title=element_text(size = rel(1.5))) +
  theme(plot.title=element_text(hjust = 0.5))

Exercise 3

You can color the boxplot differently by using some specific layers:

  1. Add a new layer scale_color_manual(values=c("purple","orange")).
# Adding scale_color_manual()
ggplot(new_metadata) +
  geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
  ggtitle("Genotype differences in average gene expression") +
  xlab("Genotype") +
  ylab("Mean expression") +
  theme_bw() +
  theme(axis.title = element_text(size = rel(1.25))) +
  theme(plot.title=element_text(size = rel(1.5))) +
  theme(plot.title=element_text(hjust = 0.5)) +
  scale_color_manual(values=c("purple","orange"))

  • Do you observe a change?

There is no change.

  1. Replace scale_color_manual(values=c("purple","orange")) with scale_fill_manual(values=c("purple","orange")).
# Replacing scale_color_manual() with scale_fill_manual()
ggplot(new_metadata) +
  geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
  ggtitle("Genotype differences in average gene expression") +
  xlab("Genotype") +
  ylab("Mean expression") +
  theme_bw() +
  theme(axis.title = element_text(size = rel(1.25))) +
  theme(plot.title=element_text(size = rel(1.5))) +
  theme(plot.title=element_text(hjust = 0.5)) +
  scale_fill_manual(values=c("purple","orange"))

  • Do you observe a change?

Yes, the fill color changed.

  • In the scatterplot we drew in class, add a new layer scale_color_manual(values=c("purple","orange")).
# Used scale_color_manual() on the geom_point() plot we made earlier
ggplot(new_metadata) +
  geom_point(aes(x = age_in_days, y= samplemeans, color = genotype,
            shape=celltype), size=2.25) +
  theme_bw() +
  theme(axis.title = element_text(size=rel(1.5))) +
  xlab("Age (days)") +
  ylab("Mean expression") +
  ggtitle("Mean Expression by Age") +
  theme(plot.title=element_text(hjust=0.5)) +
  scale_color_manual(values=c("purple","orange"))

  • Do you observe a difference?

Yes, the color of the points changes.

  • What do you think is the difference between scale_color_manual() and scale_fill_manual()?

scale_color_manual() works if the “color” argument is used , whereas scale_fill_manual() works if the “fill” argument is used

  1. Back in your boxplot code, change the colors in the scale_fill_manual() layer to be your 2 favorite colors.
# Using my favorite colors
ggplot(new_metadata) +
  geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
  ggtitle("Genotype differences in average gene expression") +
  xlab("Genotype") +
  ylab("Mean expression") +
  theme_bw() +
  theme(axis.title = element_text(size = rel(1.25))) +
  theme(plot.title=element_text(size = rel(1.5))) +
  theme(plot.title=element_text(hjust = 0.5)) +
  scale_fill_manual(values=c("cornflowerblue","orange"))

  • Are there any colors that you tried that did not work?

No.

Exercise 4

Find the hexadecimal code for your 2 favorite colors (from the final exercise) and replace the color names with the hexadecimal codes within the ggplot2 code chunk.

# Using my favorite colors
ggplot(new_metadata) +
  geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
  ggtitle("Genotype differences in average gene expression") +
  xlab("Genotype") +
  ylab("Mean expression") +
  theme_bw() +
  theme(axis.title = element_text(size = rel(1.25))) +
  theme(plot.title=element_text(size = rel(1.5))) +
  theme(plot.title=element_text(hjust = 0.5)) +
  scale_fill_manual(values=c("#6495ED","#FFA500"))