Skip to the content.

Workshop Schedule

NOTE: The Basic Data Skills Introduction to R workshop is a prerequisite. If you would like some practice wih R prior to taking this workshop, please work through this R refresher lesson.

Pre-reading:

Day 1

Time Topic Instructor
09:30 - 09:45 Workshop Introduction Will
09:45 - 10:15 Pre-reading discussion Meeta
10:15 - 11:00 Understanding peaks and peak file formats Will
11:00- 11:05 Break  
11:05 - 12:00 Assessing peak quality metrics Meeta

Before the next class:

I. Please study the contents and work through all the code within the following lessons:

  1. Assessing sample similarity and identifying potential outliers
    Click here for a preview of this lesson
    One step in the QC of samples is to see how samples compare to one another. Generally, we expect replicates from each sample group to be more similar to each other and dissimilar to replicates from a different sample group. Here, we use read density (counts across the genome) and peak signal data to check if it meets our expectations.

    In this lesson you will:
    - Create PCA plots and inter-sample correlation heatmaps
    - Evaluate plots to identify potential outliers and other effects
    - Create visualiations using signal data from peaks to identify proposed thresholds for downstream analysis

  2. Concordance across replicates using peak overlaps
    Click here for a preview of this lesson
    A quantitative way of evaluating how similar replicates are is to identify how many of the same peaks were called in each replicate. Biological replicates will inevitably exhibit some amount of variability, but the hope is that the majority of our peaks are identified in each sample. By looking at peak overlaps we can identify and remove a weaker replicate and/or use the overlap to create a consensus set of peaks.

    In this lesson, we will:
    - Discuss IRange and GRanges data structures in R
    - Compute peak overlaps and create visualizations for the results

  3. Complete the exercises:
    • Each lesson above contains exercises; please go through each of them.
    • Copy over your solutions into the Google Forms the day before the next class.

Questions?

Day 2

Time Topic Instructor
09:30 - 10:00 Self-learning review All
10:00 - 10:45 Peak annotation and visualization using ChIPseeker Will
10:45- 10:55 Break  
10:55 - 12:00 Differential enrichment analysis using DiffBind Meeta

Before the next class:

I. Please study the contents and work through all the code within the following lessons:

  1. Peak visualization using IGV
    Click here for a preview of this lesson
    Now that we have identified regions that are differentially enriched, it would be good to perform a qualitative assessment. To do this we will take a look at the data in IGV, a genome browser and see what read density looks like in significant regions.

    In this lesson, we will:
    - Learn how to navigate IGV and introduce various features
    - Evaluate significant regions from DiffBind

  2. Annotation and functional analysis of DE regions
    Click here for a preview of this lesson
    To gain biological insight from the genomic coordinates identified as differentially bound, we need to map them back to genomic features and see if there is some over-representation of target genes in specific pathways.

    In this lesson, we will:
    - Use ChIPseeker to annotate the DE regions
    - Perform functional analysis on the DE target genes

  3. Complete the exercises:
    • The Functional Analysis lesson above contains exercises; please go through each of them.
    • Copy over your solutions into the Google Forms the day before the next class.

Questions?


Day 3

Time Topic Instructor
09:30 - 10:30 Self-learning review All
10:30 - 11:15 Motif analysis/discovery Meeta
11:15- 11:25 Break  
11:05 - 11:45 Discussion Q&A All
11:45 - 12:00 Wrap-up Will

Answer keys

Resources


These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.


These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.