# Run tximport
<- tximport(files,
txi type="salmon",
tx2gene=t2g,
countsFromAbundance = "lengthScaledTPM")
# "files" is a vector wherein each element is the path to the salmon quant.sf file, and each element is named with the name of the sample.
# "t2g" is a 2 column data frame which contains transcript IDs mapped to geneIDs (in that order)
Summary of DGE workflow
Approximate time: 15 minutes
Learning Objectives
- Identify the R commands needed to run a complete differential expression analysis using DESeq2
Summary of differential expression analysis workflow
We have detailed the various steps in a differential expression analysis workflow, providing theory with example code. To provide a more succinct reference for the code needed to run a DGE analysis, we have summarized the steps in an analysis below:
- Obtaining gene-level counts from Salmon using tximport
- Creating the dds object:
# Check that the row names of the metadata equal the column names of the **raw counts** data
all(colnames(txi$counts) == rownames(metadata))
# Create DESeq2Dataset object
<- DESeqDataSetFromTximport(txi,
dds colData = metadata,
design = ~ condition)
- Exploratory data analysis (PCA & hierarchical clustering) - identifying outliers and sources of variation in the data:
# Transform counts for data visualization
<- rlog(dds,
rld blind=TRUE)
# Plot PCA
plotPCA(rld,
intgroup="condition")
# Extract the rlog matrix from the object and compute pairwise correlation values
<- assay(rld)
rld_mat <- cor(rld_mat)
rld_cor
# Plot heatmap
pheatmap(rld_cor,
annotation = metadata)
- Run DESeq2:
# **Optional step** - Re-create DESeq2 dataset if the design formula has changed after QC analysis in include other sources of variation
# dds <- DESeqDataSetFromTximport(txi, colData = metadata, design = ~ covariate + condition)
# Run DESeq2 differential expression analysis
<- DESeq(dds)
dds
# **Optional step** - Output normalized counts to save as a file to access outside RStudio
# normalized_counts <- counts(dds, normalized=TRUE)
- Check the fit of the dispersion estimates:
# Plot dispersion estimates
plotDispEsts(dds)
- Create contrasts to perform Wald testing on the shrunken log2 foldchanges between specific conditions:
# Specify contrast for comparison of interest
<- c("condition", "level_to_compare", "base_level")
contrast
# Output results of Wald test for contrast
<- results(dds,
res contrast = contrast,
alpha = 0.05)
# Shrink the log2 fold changes to be more accurate
<- lfcShrink(dds,
res coef = "sampletype_group1_vs_group2",
type = "apeglm")
# The coef will be dependent on what your contrast was. and should be identical to what is stored in resultsNames()
- Output significant results:
# Set thresholds
< - 0.05
padj.cutoff
# Turn the results object into a tibble for use with tidyverse functions
<- res %>%
res_tbl data.frame() %>%
rownames_to_column(var="gene") %>%
as_tibble()
# Subset the significant results
<- dplyr::filter(res_tbl,
sig_res < padj.cutoff) padj
Visualize results: volcano plots, heatmaps, normalized counts plots of top genes, etc.
Perform analysis to extract functional significance of results: GO or KEGG enrichment, GSEA, etc.
Make sure to output the versions of all tools used in the DE analysis:
sessionInfo()
For better reproducibility, it can help to create RMarkdown reports, which save all code, results, and visualizations as nicely formatted html reports. We have a very basic example of a report linked here. To create these reports we have additional materials available.