<- apply(data[,1:3], 1, mean) # select column 1 to 3, which correspond to Irrel_kd samples
mean_counts_ctrl <- apply(data[,1:3], 1, var)
variance_counts_ctrl <- data.frame(mean_counts_ctrl, variance_counts_ctrl)
df_ctrl ggplot(df_ctrl) +
geom_point(aes(x = mean_counts_ctrl, y = variance_counts_ctrl)) + # plot mean vs variance
scale_y_log10(limits = c(1,1e9)) +
scale_x_log10(limits = c(1,1e9)) +
geom_abline(intercept = 0, slope = 1, color = "red") # add a line for x = y (slope = 1)
RNA-seq count data distribution - Answer key
Exercise 1
Evaluate the relationship between mean and variance for the control replicates (Irrel_kd samples). Note the differences or similarities in the plot compared to the one using the overexpression replicates.
The plot of mean and variance for the control replicates is similar to that of overexpression replicates shown in the lesson.
Exercise 2
An RNA-seq experiment was conducted on mice forebrain to evaluate the effect of increasing concentrations of a treatment. For each of the five different concentrations we have n = 5 mice for a total of 25 samples. If we observed little to no variability between replicates, what might this suggest about our samples?
The lack of variability between replicates suggests that we are possibly dealing with technical replicates. With true biological replicates we expect some amount of variability. If you have technical replicates, you do not want to be using DESeq2 because we will be using the NB to account for overdispersion, which doesn’t exist.
What type of mean-variance relationship would you expect to see for this dataset?
mean == variance. A Poisson would be more appropriate.