Approximate time: 15 minutes
Learning Objectives
- Understand the commands needed to run a complete differential expression analysis
Summary of differential expression analysis workflow
We have detailed the various steps in a differential expression analysis workflow, providing theory with example code. To provide a more succinct reference for the code needed to run a DGE analysis, we have summarized the steps in an analysis below:
-
Count normalization:
# Check that the row names of the metadata equal the column names of the **raw counts** data all(colnames(raw_counts) == rownames(metadata)) # Create DESeq2Dataset object dds <- DESeqDataSetFromMatrix(countData = raw_counts, colData = metadata, design = ~ condition)
-
Exploratory data analysis (PCA & heirarchical clustering) - identifying outliers and sources of variation in the data:
# Transform counts for data visualization rld <- rlog(dds, blind=TRUE) # Plot PCA plotPCA(rld, intgroup="condition") # Extract the rlog matrix from the object rld_mat <- assay(rld) # Compute pairwise correlation values rld_cor <- cor(rld_mat) # Plot heatmap pheatmap(rld_cor)
-
Run DESeq2:
# **Optional step** - Re-create DESeq2 dataset if the design formula has changed after QC analysis in include other sources of variation dds <- DESeqDataSetFromMatrix(countData = raw_counts, colData = metadata, design = ~ condition) # Run DESeq2 differential expression analysis dds <- DESeq(dds) # **Optional step** - Output normalized counts to save as a file to access outside RStudio normalized_counts <- counts(dds, normalized=TRUE)
-
Check the fit of the dispersion estimates:
# Plot dispersion estimates plotDispEsts(dds)
-
Create contrasts to perform Wald testing on the shrunken log2 foldchanges between specific conditions:
# Output results of Wald test for contrast contrast <- c("condition", "level_to_compare", "base_level") res <- results(dds, contrast = contrast) res <- lfcShrink(dds, contrast = contrast, res=res)
-
Output significant results:
# Turn the results object into a data frame res_df <- data.frame(res) # Subset the significant results sig_res <- filter(res_df, padj < padj.cutoff & abs(log2FoldChange) > lfc.cutoff)
-
Visualize results: volcano plots, heatmaps, normalized counts plots of top genes, etc.
-
Make sure to output the versions of all tools used in the DE analysis:
sessionInfo()