Self-learning Answer Key

File Formats

Using our knowledge of FLAGs in SAM files let’s decode a few using the tool on the Broad’s Website.

1. An alignment has a FLAG of 115. What do we know about this read?

Click here to see the answer

Read paired
Read mapped in proper pair
Read reverse strand
Mate reverse strand
First in pair

2. What would be the FLAG for a read alignment for the first read in a pair-end read, where the first read was unmapped while the second read was mapped to the reverse strand?

Click here to see the answer

101

3. Below is an alignment, what would be the CIGAR string for this alignment?

Click here to see the answer

3M1X2M1X1D4M2X2M1M1I1M

4. Using grep, extract only the meta-information lines from the VCF file.

Click here to see the answer

  grep '^##' sample.vcf

5. using grep, extract the lines containing the names of all of the software packages that were used in the creation of this VCF file?

Click here to see the answer

  grep '^##source' sample.vcf

Bonus Challenge 6. For the sample at position 806262 on chromosome 19, what is the reference allele?

Click here to see the answer

  less sample.vcf

Then search the less buffer with:

  /19Tab806262

  grep -e $'^19\t806262' sample.vcf

7. Create a BED file within ~/variant_calling/ called my_file.bed and have it contain these three ranges:

Chromosome 1 from 84573 to 94573

Chromosome 2 from 465352 to 466352

Chromosome 19 from 111237 to 111238

Click here to see the answer

Open vim with:

  vim my_file.bed

Enter insert mode and type:

  1 84573 94573
  2 465352  466352
  19  111237 111238

Exit insert mode by pressing Esc, then save and quit the file by typing :wq while in Command mode.

Evaluating Read Qualities with `FastQC`

1. If the probability of a incorrect base call is 1 in 3,981, what is the associated PHRED score?

Click here to see the answer

-10 x log10(1/3981) ≈ 36

2. Assign the path, /The/path/to/my/vcf_file.vcf, to a variable named VCF_PATH and replace the .vcf extension with .filtered.vcf.

Click here to see the answer

  VCF_PATH=/The/path/to/my/vcf_file.vcf
  echo ${VCF_PATH%.vcf}.filtered.vcf

3. Assign the new path with the .filtered.vcf extension to a variable named FILTERED_VCF_PATH then echo this variable.

Click here to see the answer

  FILTERED_VCF_PATH=`echo ${VCF_PATH%.vcf}.filtered.vcf`
  echo $FILTERED_VCF_PATH

4. Copy the BED file from /n/groups/hbctraining/variant_calling/sample_data/sample.bed to ~/variant_calling/ directory. Move to the ~/variant_calling/ directory and use sed to stripe chr from the chromosome names and have the output look like:

200	300
600	900
10	1000

Click here to see the answer

  cp /n/groups/hbctraining/variant_calling/sample_data/sample.bed ~/variant_calling/
  cd ~/variant_calling/
  sed 's/chr//g' sample.bed

5. Redirect this output to a new file called sample.without_chr.bed

Click here to see the answer

  sed 's/chr//g' sample.bed > sample.without_chr.bed

Sequence Alignment Theory

1. The read group field LB (Library) is a required field to when adding read groups using Picard’s AddOrReplaceReadGroups, but we don’t currently have this field in our read group information. How would we alter out bwa command to include LB as well?

Click here to see the answer

Change:

  -R "@RG\tID:$SAMPLE\tPL:illumina\tPU:$SAMPLE\tSM:$SAMPLE"

To:

  -R "@RG\tID:$SAMPLE\tPL:illumina\tPU:$SAMPLE\tSM:$SAMPLE\tLB:$SAMPLE"

2. If we wanted to increase the number of threads used by bwa for processing our alignment to 12, where are the two places we would need to modify our SBATCH script to accommodate this?

Click here to see the answer

Change with the SBATCH directives:

  #SBATCH -c 8

To:

  #SBATCH -c 12

AND Change the bwa command:

  -t 8 \

To:

  -t 12 \

Alignment File Processing

1. When inspecting a SAM file you see the following order:

Is this SAM file’s sort order: unsorted, query-sorted, coordinate-sorted or is it ambiguous?

Click here to see the answer

Query-sorted

2. We are comparing our SortSam command with our colleague’s command. Is there anything wrong with their syntax? Why or why not?

Our syntax

java -jar $PICARD/picard.jar SortSam \
--INPUT $SAM_FILE \
--OUTPUT $QUERY_SORTED_BAM_FILE \
--SORT_ORDER queryname

Our colleague’s syntax

java -jar $PICARD/picard.jar SortSam \
I=$SAM_FILE \
O=$QUERY_SORTED_BAM_FILE \
SO=queryname

Click here to see the answer

No, it is just using the traditional syntax with abbreviations.

Alignment Quality Control

1. Inspect your coordinate-sorted BAM file with ViewSam package within Picard. What version of bwa was used in the alignment?

Click here to see the answer

bwa version 0.7.17-r1188

2. Inspect your coordinate-sorted BAM file with ViewSam package within Picard. What is the flag for your first aligned read? Using the Broad’s decoding FLAG tool, what does this flag mean?

Click here to see the answer

129

Read paired
Second in pair

Automation of Variant Calling

1. If we submit Job_A.sh and SLURM returns:

Submitted batch job 213489

And we want to start Job_B.sh after Job_A.sh finishes without error, what command would we use to do this?

Click here to see the answer

  sbatch --dependency=afterok:213489 Job_B.sh

2. If we submit Job_X.sh and SLURM returns:

Submitted batch job 213489

Then we submit Job_Y.sh and SLURM returns:

Submitted batch job 213496

And we want to start Job_Z.sh after Job_X.sh and Job_Y.sh finishes without error, what command would we use to do this?

Click here to see the answer

  sbatch --dependency=afterok:213489:213496 Job_Z.sh

3. If you want to submit Job_M.sh, and provide reference.fa as the first positional parameter and input.bam as the second positional parameter, how could you do this?

Click here to see the answer

  sbatch Job_M.sh reference.fa input.bam

Variant Prioritization

1. Extract from mutect2_syn3_normal_syn3_tumor_GRCh38.p7-pass-filt-LCR.pedigree_header.snpeff.dbSNP.vcf all of the MODERATE-impact mutations on Chromosome 12.

Click here to see the answer

  java -jar $SNPEFF/SnpSift.jar filter "( CHROM = '12' ) & ( ANN[*].IMPACT has 'MODERATE' )" mutect2_syn3_normal_syn3_tumor_GRCh38.p7-pass-filt-LCR.pedigree_header.snpeff.dbSNP.vcf  | less

2. Pipe the output from Exercise 1) into a command to only display one line for each effect.

Click here to see the answer

  java -jar $SNPEFF/SnpSift.jar filter "( CHROM = '12' ) & ( ANN[*].IMPACT has 'MODERATE' )" mutect2_syn3_normal_syn3_tumor_GRCh38.p7-pass-filt-LCR.pedigree_header.snpeff.dbSNP.vcf  | $SNPEFF/scripts/vcfEffOnePerLine.pl | less

3. Pipe the output from Exercise 2) into a command to extract the chromosome, position, gene and effect.

Click here to see the answer

  java -jar $SNPEFF/SnpSift.jar filter "( CHROM = '12' ) & ( ANN[*].IMPACT has 'MODERATE' )" mutect2_syn3_normal_syn3_tumor_GRCh38.p7-pass-filt-LCR.pedigree_header.snpeff.dbSNP.vcf  | $SNPEFF/scripts/vcfEffOnePerLine.pl | java -jar $SNPEFF/SnpSift.jar extractFields - "CHROM" "POS" "ANN[*].GENE" "ANN[*].EFFECT" | less

4. Redirect the output from Exercise 3) into a new file called mutect2_syn3_normal_syn3_tumor_GRCh38.p7-pass-filt-LCR.pedigree_header.snpeff.dbSNP.chr12_moderate-impact.txt

Click here to see the answer

  java -jar $SNPEFF/SnpSift.jar filter "( CHROM = '12' ) & ( ANN[*].IMPACT has 'MODERATE' )" mutect2_syn3_normal_syn3_tumor_GRCh38.p7-pass-filt-LCR.pedigree_header.snpeff.dbSNP.vcf  | $SNPEFF/scripts/vcfEffOnePerLine.pl | java -jar $SNPEFF/SnpSift.jar extractFields - "CHROM" "POS" "ANN[*].GENE" "ANN[*].EFFECT" > mutect2_syn3_normal_syn3_tumor_GRCh38.p7-pass-filt-LCR.pedigree_header.snpeff.dbSNP.chr12_moderate-impact.txt

Back to Schedule

Self-learning Answer Key

File Formats

Evaluating Read Qualities with FastQC

Sequence Alignment Theory

Alignment File Processing

Alignment Quality Control

Automation of Variant Calling

Variant Prioritization

Evaluating Read Qualities with `FastQC`