Approximate time: 15 minutes
Learning Objectives
- Understand the commands needed to run a complete differential expression analysis using DESeq2
Summary of differential expression analysis workflow
We have detailed the various steps in a differential expression analysis workflow, providing theory with example code. To provide a more succinct reference for the code needed to run a DGE analysis, we have summarized the steps in an analysis below:
-
Obtaining gene-level counts from Salmon using tximport
# Run tximport txi <- tximport(files, type="salmon", tx2gene=t2g, countsFromAbundance = "lengthScaledTPM") # "files" is a vector wherein each element is the path to the salmon quant.sf file, and each element is named with the name of the sample. # "t2g" is a 2 column data frame which contains transcript IDs mapped to geneIDs (in that order)
-
Creating the dds object:
# Check that the row names of the metadata equal the column names of the **raw counts** data all(colnames(txi$counts) == rownames(metadata)) # Create DESeq2Dataset object dds <- DESeqDataSetFromTximport(txi, colData = metadata, design = ~ condition)
-
Exploratory data analysis (PCA & heirarchical clustering) - identifying outliers and sources of variation in the data:
# Transform counts for data visualization rld <- rlog(dds, blind=TRUE) # Plot PCA plotPCA(rld, intgroup="condition") # Extract the rlog matrix from the object and compute pairwise correlation values rld_mat <- assay(rld) rld_cor <- cor(rld_mat) # Plot heatmap pheatmap(rld_cor, annotation = metadata)
-
Run DESeq2:
# **Optional step** - Re-create DESeq2 dataset if the design formula has changed after QC analysis in include other sources of variation using "dds <- DESeqDataSetFromTximport(txi, colData = metadata, design = ~ covaraite + condition)" # Run DESeq2 differential expression analysis dds <- DESeq(dds) # **Optional step** - Output normalized counts to save as a file to access outside RStudio using "normalized_counts <- counts(dds, normalized=TRUE)"
-
Check the fit of the dispersion estimates:
# Plot dispersion estimates plotDispEsts(dds)
-
Create contrasts to perform Wald testing on the shrunken log2 foldchanges between specific conditions:
# Output results of Wald test for contrast contrast <- c("condition", "level_to_compare", "base_level") res <- results(dds, contrast = contrast, alpha = 0.05) res <- lfcShrink(dds, contrast = contrast, res=res)
-
Output significant results:
# Set thresholds padj.cutoff < - 0.05 # Turn the results object into a tibble for use with tidyverse functions res_tbl <- res %>% data.frame() %>% rownames_to_column(var="gene") %>% as_tibble() # Subset the significant results sig_res <- filter(res_tbl, padj < padj.cutoff)
-
Visualize results: volcano plots, heatmaps, normalized counts plots of top genes, etc.
-
Perform analysis to extract functional significance of results: GO or KEGG enrichment, GSEA, etc.
-
Make sure to output the versions of all tools used in the DE analysis:
sessionInfo()
For better reproducibility, it can help to create RMarkdown reports, which save all code, results, and visualizations as nicely formatted html reports. We have available an example html report for perusal. To create these reports we have additional materials available.