# Create a boxplot where the x-axis is the genotype and the y-axis are the samplemeans
ggplot(new_metadata) +
geom_boxplot(aes(x = genotype, y = samplemeans))Boxplot visualization Answer Key
Exercise 1
Generate a boxplot using the data in the new_metadata dataframe. Create a ggplot2 code chunk with the following instructions:
- Use the
geom_boxplot()layer to plot the differences in sample means between the Wt and KO genotypes.
- Use the
fillaesthetic to look at differences in sample means between the celltypes within each genotype.
# Use celltype to provide fill for the boxplot
ggplot(new_metadata) +
geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype))- Add a title to your plot.
# Add a title to the plot
ggplot(new_metadata) +
geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
ggtitle("Genotype differences in average gene expression")- Add labels, “Genotype” for the x-axis and “Mean expression” for the y-axis.
# Add axes labels
ggplot(new_metadata) +
geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
ggtitle("Genotype differences in average gene expression") +
xlab("Genotype") +
ylab("Mean expression")- Make the following
theme()changes:
- Use the
theme_bw()function to make the background white.
# Utilize theme_bw() to make the background white
ggplot(new_metadata) +
geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
ggtitle("Genotype differences in average gene expression") +
xlab("Genotype") +
ylab("Mean expression") +
theme_bw()- Change the size of your axes labels to 1.25x larger than the default.
# Increase the size of the axes labels
ggplot(new_metadata) +
geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
ggtitle("Genotype differences in average gene expression") +
xlab("Genotype") +
ylab("Mean expression") +
theme_bw() +
theme(axis.title = element_text(size = rel(1.25)))- Change the size of your plot title to 1.5x larger than default.
# Increase the size of the plot title
ggplot(new_metadata) +
geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
ggtitle("Genotype differences in average gene expression") +
xlab("Genotype") +
ylab("Mean expression") +
theme_bw() +
theme(axis.title = element_text(size = rel(1.25))) +
theme(plot.title=element_text(size = rel(1.5)))- Center the plot title.
# Center the plot title
ggplot(new_metadata) +
geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
ggtitle("Genotype differences in average gene expression") +
xlab("Genotype") +
ylab("Mean expression") +
theme_bw() +
theme(axis.title = element_text(size = rel(1.25))) +
theme(plot.title=element_text(size = rel(1.5))) +
theme(plot.title=element_text(hjust = 0.5))Exercise 2
Let’s say you wanted to have the “Wt” boxplots displayed first on the left side, and “KO” on the right. How might you go about doing this?
To do this, your first question should be - How does ggplot2 determine what to place where on the X-axis?
- The order of the genotype on the X axis is in alphabetical order.
- To change it, you need to make sure that the genotype column is a factor
- And, the factor levels for that column are in the order you want on the x-axis
- Factor the
new_metadata$genotypecolumn without creating any extra variables/objects and change the levels toc("Wt", "KO")
# Convert the genotype column of the new_metadata data frame to a factor with the levels being "Wt" then "KO"
new_metadata$genotype <- factor(new_metadata$genotype, c("Wt", "KO"))- Re-run the boxplot code chunk you created for the exercise above.
# Re-create the plot with the newly ordered x-axis
ggplot(new_metadata) +
geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
ggtitle("Genotype differences in average gene expression") +
xlab("Genotype") +
ylab("Mean expression") +
theme_bw() +
theme(axis.title = element_text(size = rel(1.25))) +
theme(plot.title=element_text(size = rel(1.5))) +
theme(plot.title=element_text(hjust = 0.5))Exercise 3
You can color the boxplot differently by using some specific layers:
- Add a new layer
scale_color_manual(values=c("purple","orange")).
# Adding scale_color_manual()
ggplot(new_metadata) +
geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
ggtitle("Genotype differences in average gene expression") +
xlab("Genotype") +
ylab("Mean expression") +
theme_bw() +
theme(axis.title = element_text(size = rel(1.25))) +
theme(plot.title=element_text(size = rel(1.5))) +
theme(plot.title=element_text(hjust = 0.5)) +
scale_color_manual(values=c("purple","orange"))- Do you observe a change?
There is no change.
- Replace
scale_color_manual(values=c("purple","orange"))withscale_fill_manual(values=c("purple","orange")).
# Replacing scale_color_manual() with scale_fill_manual()
ggplot(new_metadata) +
geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
ggtitle("Genotype differences in average gene expression") +
xlab("Genotype") +
ylab("Mean expression") +
theme_bw() +
theme(axis.title = element_text(size = rel(1.25))) +
theme(plot.title=element_text(size = rel(1.5))) +
theme(plot.title=element_text(hjust = 0.5)) +
scale_fill_manual(values=c("purple","orange"))- Do you observe a change?
Yes, the fill color changed.
- In the scatterplot we drew in class, add a new layer
scale_color_manual(values=c("purple","orange")).
# Used scale_color_manual() on the geom_point() plot we made earlier
ggplot(new_metadata) +
geom_point(aes(x = age_in_days, y= samplemeans, color = genotype,
shape=celltype), size=2.25) +
theme_bw() +
theme(axis.title = element_text(size=rel(1.5))) +
xlab("Age (days)") +
ylab("Mean expression") +
ggtitle("Mean Expression by Age") +
theme(plot.title=element_text(hjust=0.5)) +
scale_color_manual(values=c("purple","orange"))- Do you observe a difference?
Yes, the color of the points changes.
- What do you think is the difference between
scale_color_manual()andscale_fill_manual()?
scale_color_manual() works if the “color” argument is used , whereas scale_fill_manual() works if the “fill” argument is used
- Back in your boxplot code, change the colors in the
scale_fill_manual()layer to be your 2 favorite colors.
# Using my favorite colors
ggplot(new_metadata) +
geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
ggtitle("Genotype differences in average gene expression") +
xlab("Genotype") +
ylab("Mean expression") +
theme_bw() +
theme(axis.title = element_text(size = rel(1.25))) +
theme(plot.title=element_text(size = rel(1.5))) +
theme(plot.title=element_text(hjust = 0.5)) +
scale_fill_manual(values=c("cornflowerblue","orange"))- Are there any colors that you tried that did not work?
No.
Exercise 4
Find the hexadecimal code for your 2 favorite colors (from the final exercise) and replace the color names with the hexadecimal codes within the ggplot2 code chunk.
# Using my favorite colors
ggplot(new_metadata) +
geom_boxplot(aes(x = genotype, y = samplemeans, fill = celltype)) +
ggtitle("Genotype differences in average gene expression") +
xlab("Genotype") +
ylab("Mean expression") +
theme_bw() +
theme(axis.title = element_text(size = rel(1.25))) +
theme(plot.title=element_text(size = rel(1.5))) +
theme(plot.title=element_text(hjust = 0.5)) +
scale_fill_manual(values=c("#6495ED","#FFA500"))