<- DESeq(dds, test="LRT", reduced = ~ 1) dds_lrt
DGE analysis using LRT in DESeq2
Approximate time: 60 minutes
Learning Objectives
- Apply the Likelihood Ratio Test (LRT) for hypothesis testing
- Compare results generated from the LRT to results obtained using the Wald test
- Identify shared expression profiles from the LRT significant gene list
Exploring results from the Likelihood ratio test (LRT)
DESeq2 also offers the Likelihood Ratio Test as an alternative when evaluating expression change across more than two levels. Genes that are identified as significant are those that are changing in expression in any direction across the different factor levels.
Generally, this test will result in a larger number of genes than the individual pairwise comparisons. While the LRT is a test of significance for differences of any level(s) of the factor, one should not expect it to be exactly equal to the union of sets of genes using Wald tests (although we do expect a high degree of overlap).
The results()
table
To extract the results from our dds_lrt
object we can use the same results()
function we had used with the Wald test. There is no need for contrasts since we are not making a pairwise comparison.
In an earlier lesson on hypothesis testing, we had you create the object dds_lrt
. If you are having trouble finding the object, please run the code:
# Extract results for LRT
<- results(dds_lrt) res_LRT
Let’s take a look at the results table:
# View results for LRT
res_LRT
log2 fold change (MLE): sampletype MOV10 overexpression vs control
LRT p-value: '~ sampletype' vs '~ 1'
DataFrame with 57761 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000000003 3525.8835 -0.438245 0.0774607 40.46117 1.63670e-09
ENSG00000000005 26.2489 0.029208 0.4411295 1.61898 4.45084e-01
ENSG00000000419 1478.2512 0.383635 0.1137609 11.34102 3.44611e-03
ENSG00000000457 518.4220 0.228971 0.1023313 14.63134 6.65035e-04
ENSG00000000460 1159.7761 -0.269138 0.0814993 25.03939 3.65398e-06
... ... ... ... ... ...
ENSG00000285889 1.82171 -4.68144 3.9266061 2.35649 0.307818323
ENSG00000285950 7.58089 -1.01978 1.0715583 1.21446 0.544857226
ENSG00000285976 4676.24904 0.19364 0.0656673 14.87805 0.000587859
ENSG00000285978 2.25697 4.13612 2.0706212 4.68720 0.095981569
ENSG00000285980 0.00000 NA NA NA NA
padj
<numeric>
ENSG00000000003 3.14071e-08
ENSG00000000005 5.88670e-01
ENSG00000000419 1.22924e-02
ENSG00000000457 3.04551e-03
ENSG00000000460 3.23425e-05
... ...
ENSG00000285889 NA
ENSG00000285950 NA
ENSG00000285976 0.00273904
ENSG00000285978 NA
ENSG00000285980 NA
The results table output looks similar to the Wald test results, with identical columns to what we observed previously.
Why are fold changes reported for an LRT test?
For analyses using the likelihood ratio test, the p-values are determined solely by the difference in deviance between the full and reduced model formula. A single log2 fold change is printed in the results table for consistency with other results table outputs, but is not associated with the actual test.
Columns relevant to the LRT test:
baseMean
: mean of normalized counts for all samplesstat
: the difference in deviance between the reduced model and the full modelpvalue
: the stat value is compared to a chi-squared distribution to generate a pvaluepadj
: BH adjusted p-values
Additional columns:
log2FoldChange
: log2 fold changelfcSE
: standard error
Printed at the top of the the results table are the two sample groups used to generate the log2 fold change values that we observe in the results table. This can be controlled using the name
argument; the value provided to name
must be an element of resultsNames(dds)
.
Identifying significant genes
When filtering significant genes from the LRT we threshold only the padj
column. How many genes are significant at padj < 0.05
?
# Create a tibble for LRT results
<- res_LRT %>%
res_LRT_tb data.frame() %>%
rownames_to_column(var="gene") %>%
as_tibble()
# Subset to return genes with padj < 0.05
<- res_LRT_tb %>%
sigLRT_genes ::filter(padj < padj.cutoff)
dplyr
# Get number of significant genes
nrow(sigLRT_genes)
[1] 7315
# Compare to numbers we had from Wald test
nrow(sigOE) # overexpression vs control
[1] 4774
nrow(sigKD) # knockdown vs control
[1] 2827
The number of significant genes observed from the LRT is quite high. This list includes genes that can be changing in any direction across the three factor levels (control, KO, overexpression). To reduce the number of significant genes, we can increase the stringency of our FDR threshold (padj.cutoff
).
- Compare the resulting gene list from the LRT test to the gene lists from the Wald test comparisons.
- How many of the
sigLRT_genes
overlap with the significant genes insigOE
? - How many of the
sigLRT_genes
overlap with the significant genes insigKD
?
- How many of the