Introduction to Variant Calling
Learning Objectives
- Evaluate QC metrics for variant calling
- Call variants using GATK
- Filter variants to retain only high-quality variant calls
- Annotate variants using SnpEff and dbSNP
- Prioritize variants by their impact
- Visualize variants in IGV
Installations
On your desktop
- FileZilla Client (make sure you get ‘FileZilla Client’)
- Integrative Genomics Viewer (IGV)
On your HPCC (if not using Harvard’s O2 cluster)
Required
FastQC
version 0.11.9bwa
version 0.7.17Picard
version 2.27.5MultiQC
version 1.12GATK
version 4.1.9.0SnpEff and SnpSift suite
version 4.3gbcftools
version 1.14
Optional
NOTE: If you are not working on the O2 cluster and are using different versions of these software programs, these packages may still work with the provided commands. However,this workshop was designed on these versions specifically, so you may need to tweak some of the commands if you use different versions of this software.
Lessons
- ICGC-TCGA DREAM Mutation Calling Challenge Synthetic Dataset
- Project Organization
- File Formats
- Evaluating Read Quality with
FastQC
- Sequence Read Alignment
- Alignment File Processing
- Alignment File Quality Control
- Evaluating Quality Control Metrics
- Variant Calling
- Variant Filtering
- Variant Annotation with SnpEff
- Automation of Variant Calling Pipeline
- Variant Prioritization with SnpSift
- Visualization in IGV
NOTE: If you aren’t working on Harvard’s O2 cluster the directory structure for the HPCC that you are using is likely different and you will need to modify paths to work within your HPCC’s directory structure.
Answer key
These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.