dds_lrt <- DESeq(dds, test="LRT", reduced = ~ 1)DGE analysis using LRT in DESeq2
This lesson demonstrates the use of the Likelihood Ratio Test (LRT) in DESeq2 to identify genes with expression changes across multiple sample groups in RNA-seq differential expression analysis. Participants will extract and interpret LRT results, compare them to Wald test findings, filter significant gene lists and apply hierarchical clustering methods to discover groups of genes with shared expression patterns using the DEGreport package. The lesson includes guidance on data extraction, significance thresholding and cluster-based exploration of complex differential gene expression results.
DESeq2, DEGreport, Likelihood Ratio Test, Multiple group comparison, Gene expression patterns
Approximate time: 60 minutes
Learning Objectives
In this lesson, you will:
- Apply the Likelihood Ratio Test (LRT) for hypothesis testing
- Compare results generated from the LRT to results obtained using the Wald test
- Identify shared expression profiles from the LRT significant gene list
Exploring results from the Likelihood ratio test (LRT)
DESeq2 also offers the Likelihood Ratio Test as an alternative when evaluating expression change across more than two levels. Genes that are identified as significant are those that are changing in expression in any direction across the different factor levels.
Generally, this test will result in a larger number of genes than the individual pairwise comparisons. While the LRT is a test of significance for differences of any level(s) of the factor, one should not expect it to be exactly equal to the union of sets of genes using Wald tests (although we do expect a high degree of overlap).
The results() table
To extract the results from our dds_lrt object we can use the same results() function we had used with the Wald test. There is no need for contrasts since we are not making a pairwise comparison.
In an earlier lesson on hypothesis testing, we had you create the object dds_lrt. If you are having trouble finding the object, please run the code:
# Extract results for LRT
res_LRT <- results(dds_lrt)Let’s take a look at the results table:
# View results for LRT
res_LRT log2 fold change (MLE): sampletype MOV10 overexpression vs control
LRT p-value: '~ sampletype' vs '~ 1'
DataFrame with 57761 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000000003 3525.8835 -0.438245 0.0774607 40.46117 1.63670e-09
ENSG00000000005 26.2489 0.029208 0.4411295 1.61898 4.45084e-01
ENSG00000000419 1478.2512 0.383635 0.1137609 11.34102 3.44611e-03
ENSG00000000457 518.4220 0.228971 0.1023313 14.63134 6.65035e-04
ENSG00000000460 1159.7761 -0.269138 0.0814993 25.03939 3.65398e-06
... ... ... ... ... ...
ENSG00000285889 1.82171 -4.68144 3.9266061 2.35649 0.307818323
ENSG00000285950 7.58089 -1.01978 1.0715583 1.21446 0.544857226
ENSG00000285976 4676.24904 0.19364 0.0656673 14.87805 0.000587859
ENSG00000285978 2.25697 4.13612 2.0706212 4.68720 0.095981569
ENSG00000285980 0.00000 NA NA NA NA
padj
<numeric>
ENSG00000000003 3.14071e-08
ENSG00000000005 5.88670e-01
ENSG00000000419 1.22924e-02
ENSG00000000457 3.04551e-03
ENSG00000000460 3.23425e-05
... ...
ENSG00000285889 NA
ENSG00000285950 NA
ENSG00000285976 0.00273904
ENSG00000285978 NA
ENSG00000285980 NA
The results table output looks similar to the Wald test results, with identical columns to what we observed previously.
Why are fold changes reported for an LRT test?
For analyses using the likelihood ratio test, the p-values are determined solely by the difference in deviance between the full and reduced model formula. A single log2 fold change is printed in the results table for consistency with other results table outputs, but is not associated with the actual test.
Columns relevant to the LRT test:
baseMean: mean of normalized counts for all samplesstat: the difference in deviance between the reduced model and the full modelpvalue: the stat value is compared to a chi-squared distribution to generate a pvaluepadj: BH adjusted p-values
Additional columns:
log2FoldChange: log2 fold changelfcSE: standard error
Printed at the top of the the results table are the two sample groups used to generate the log2 fold change values that we observe in the results table. This can be controlled using the name argument; the value provided to name must be an element of resultsNames(dds).
Identifying significant genes
When filtering significant genes from the LRT we threshold only the padj column. How many genes are significant at padj < 0.05?
# Create a tibble for LRT results
res_LRT_tb <- res_LRT %>%
data.frame() %>%
rownames_to_column(var="gene") %>%
as_tibble()
# Subset to return genes with padj < 0.05
sigLRT_genes <- res_LRT_tb %>%
dplyr::filter(padj < padj.cutoff)
# Get number of significant genes
nrow(sigLRT_genes)[1] 7315
# Compare to numbers we had from Wald test
nrow(sigOE) # overexpression vs control[1] 4774
nrow(sigKD) # knockdown vs control[1] 2827
The number of significant genes observed from the LRT is quite high. This list includes genes that can be changing in any direction across the three factor levels (control, KO, overexpression). To reduce the number of significant genes, we can increase the stringency of our FDR threshold (padj.cutoff).
Compare the resulting gene list from the LRT test to the gene lists from the Wald test comparisons.
1. How many of the `sigLRT_genes` overlap with the significant genes in `sigOE`?
2. How many of the `sigLRT_genes` overlap with the significant genes in `sigKD`?
