Introduction to Variant Analysis
Learning Objectives
- Evaluate QC metrics for variant calling
- Call variants using GATK
- Filter variants to retain only high-quality variant calls
- Annotate variants using SnpEff and dbSNP
- Prioritize variants by their impact
- Visualize variants in IGV
Installations
On your desktop
- FileZilla Client (make sure you get ‘FileZilla Client’)
- Integrative Genomics Viewer (IGV)
On your HPCC (if not using Harvard’s O2 cluster)
Required
- FastQCversion 0.11.9
- bwaversion 0.7.17
- Picardversion 2.27.5
- MultiQCversion 1.12
- GATKversion 4.1.9.0
- SnpEff and SnpSift suiteversion 4.3g
- bcftoolsversion 1.14
Optional
NOTE: If you are not working on the O2 cluster and are using different versions of these software programs, these packages may still work with the provided commands. However,this workshop was designed on these versions specifically, so you may need to tweak some of the commands if you use different versions of this software.
Lessons
- Introduction to Variant Analysis
- Project Organization
- Evaluating Read Quality with FastQC
- Sequence Read Alignment
- Alignment File Processing
- Alignment File Quality Control
- Aggregating QC metrics using MultiQC
- Variant Calling
- Variant Filtering
- Variant Annotation with SnpEff
- Variant Prioritization with SnpSift
- Visualization in IGV
NOTE: If you aren’t working on Harvard’s O2 cluster the directory structure for the HPCC that you are using is likely different and you will need to modify paths to work within your HPCC’s directory structure.
Answer key
These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.