Skip to the content.

Approximate time: 45 minutes

Learning Objectives

From Sequence reads to Peak calls

In this workshop we covered the steps in the first half of the ChIP-seq workflow where we go from raw sequence reads through to peak calls. We have discussed each of the steps in detail, outlining the tools involved and the file formats encountered. In this lesson, we revisit the quality checks associated with each step and summarize the main points to take away.

Quality control of sequence reads

The quality checks at this stage in the workflow include:

Examples of bad quality data

Good quality data masquerading as bad quality

Quality checks include looking for modules of the FastQC report which may report as bad quality (for any other NGS data), but indicate good quality ChIP-seq data:

Alignment quality

The quality checks at this stage in the workflow include:

If my mapping rate is low, do I discard my sample? Do not discard your sample, rather you will want to:

  1. Flag the sample as low quality. Keep an eye out for QC metrics later in the workflow for that same sample.
  2. Troubleshoot the sample. Take the unmapped reads and BLAST the sequences; if the reads are not mapping to the genome, where are they mapping? It’s possible you might identify a high level of contamination from another organism.

Image source: Land et, al, 2012

NOTE: For paired-end reads you will also want to checking percent that are properly paired. By default, Bowtie 2 searches for both concordant and discordant alignments, though searching for discordant alignments can be disabled with the --no-discordant option.

Peak quality checks

The quality checks at this stage in the workflow include:

Total number of peaks

This number will vary depending on your protein of interest and the number of expected binding sites. It can range from thousands of regions to hundred thousands. If you are only finding a handful of regions identified as significantly enriched, there is a high likelihood that your experiment failed.

Image source: Hendrix, DA, “Applied Bioinformatics” - Online textbook from Oregon State Univeristy

Possible reasons you are not seeing many peaks:

Replicate concordance

Unlike RNA-seq, increasing replicates in your ChIP-seq will not increase the number of binding sites identified. Rather, it gives you confidence that the sites you identified are true signal.

Representative browser snapshot of the four EGR1 ChIP-seq experiments, showing the much stronger peaks obtained with the second set of replicates

Image source: Land et, al, 2012

Qualitative assessment of enriched regions

At this point, if you have a reasonable number of peaks and you observe a good amount of concordance between replicates - the next step is evaluating the enriched regions. You can do this with a simple site-based inspection (i.e use a genome viewer to look for enrichment profiles fo specific target genes), or use profile plots for a genome-wide assessment.


This lesson has been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.