Workshop Schedule

NOTE: The Basic Data Skills Introduction to the command-line interface workshop is a prerequisite.

Pre-reading:

Please study the contents and work through all the exercises within the following lessons:

Day 1

Time	Topic	Instructor
09:30 - 09:45	Workshop Introduction	Meeta
09:45 - 11:00	Understanding chromatin biology using high-throughput sequencing	Dr. Shannan Ho Sui
11:00- 11:05	Break
11:05 - 11:20	HPC review Q&A	Will
11:20 - 11:50	Dataset overview and project organization	Will
11:50 - 12:00	Overview of self-learning materials and homework submission	Meeta

Before the next class:

I. Please study the contents and work through all the code within the following lessons:

Experimental design considerations for HTS of chromatin

Click here for a preview of this lesson

Before you begin thinking about performing the experiment, it is important to plan for it and choose a protocol that is best suited for you. There are many things to consider depending on the cells you are working with, and your protein of interest.

In this lesson, we will:
- Highlight the experimental design considerations for ChIP-seq and compare and contrast with CUT&RUN and ATAC-seq
- Highlight the sequencing considerations for each methods listed above
Quality Control of Sequence Data: Running FASTQC and evaluating results

Click here for a preview of this lesson

The first step of most NGS analyses is to evaluate the quality of your sequencing reads.

In this lesson you will explore:
- The FASTQC software, and how to run it on your raw sequencing data
- The HTML report that is returned from FASTQC and how to interepret the different plots
Alignment using Bowtie2

Click here for a preview of this lesson

The next step is taking our high quality reads and figuring out where in the genome the originated from. In theory this seems like a simple task, but in practice it is quite challenging.

In this lesson you will cover:
- The Bowtie2 software, a popular tool for aligning DNA sequence reads
- Alignment file formats
- How to run your alignment as a job on the cluster

NOTE: To run through the code above, you will need to be logged into O2 and working on a compute node (i.e. your command prompt should have the word compute in it). Need a refresher on the cluster? Check out this lesson from the pre-reading assignment.

Log in using ssh rc_trainingXX@o2.hms.harvard.edu and enter your password (replace the “XX” in the username with the number you were assigned in class). Your login information can be found here.

Once you are on the login node, use srun --pty -p interactive -t 0-2:30 --mem 1G /bin/bash to get on a compute node or as specified in the lesson. > 3. Proceed only once your command prompt has the word compute in it.

If you log out between lessons (using the exit command twice), please follow points 1. and 2. above to log back in and get on a compute node when you restart with the self learning.

Complete the exercises:
- Each lesson above contains exercises; please go through each of them.
- Copy over your solutions into the Google Forms the day before the next class.

Questions?

If you get stuck due to an error while runnning code in the lesson, email us

Day 2

Time	Topic	Instructor
9:30 - 10:15	Self-learning lessons review	All
10:15 - 11:00	Filtering BAM files	Will
11:00 - 11:05	Break
11:05 - 12:00	Peak calling	Meeta

Before the next class:

I. Please study the contents and work through all the code within the following lessons:

Handling peak files using bedtools

Click here for a preview of this lesson

Now that we have called peaks for each of our samples, it's time to look at the output. The output of MACS2 includes various files, with the narrowPeak file being the most important for interpretation.

In this lesson you will cover:
- The basics of the BED file format (and how it extends to narrowPeak files)
- The bedtools suite of tools
- Filtering and intersecting BED files
File formats for peak visualization

Click here for a preview of this lesson

ChIP-seq data is best evaluated by visualizing peaks. However, in order to do so we require the appropriate file formats.

In this lesson you will:
- Learn about different file formats for peak visualization
- Create bigWig files

- Discuss normalization metrics and considerations when choosing a method
Qualitative assessment of peak enrichment using deepTools

Click here for a preview of this lesson

An exciting component of ChIP-seq analysis is to be able to visualize your results, and gain some biologically meaningful insight. This may in turn generate hypothesis for you to further explore with your data!

In this lesson you will learn:
- How to use deepTools to create heatmaps and profile plots
- To ask questions about your data and find answers through visualization
Complete the exercises:
- Each lesson above contains exercises; please go through each of them.
- Copy over your solutions into the Google Forms the day before the next class.

Questions?

If you get stuck due to an error while runnning code in the lesson, email us

Day 3

Time	Topic	Instructor
9:30 - 10:00	Self-learning lessons review	All
10:00 - 10:30	Troubleshooting your ChIP-seq analysis	Meeta
10:30 - 10:35	Break
10:35 - 11:45	Automating the ChIP-seq workflow	Will
11:45 - 12:00	Wrap-up	Meeta

Answer keys

Day 1 exercises
- Data Management and project organization
- QC and Alignment questions
Day 2 exercises
- Handling peak calls
Day 3 In-class
- Automation shell script
- Parallelization script

Resources

ENCODE Data Standards and Processing Pipeline Information for Histone and Transcription Factors
ENCODE guidelines and practices for ChIP-seq. An older paper, but a good outline of general best practices.
Experimental design considerations:
- Thermofisher Step-by-step guide to a successful ChIP experiment
- “Chromatin Immunoprecipitation (ChIP) Principles and How to Obtain Quality Results”, BenchSci Blog
- O’Geen et al (2011), Methods Mol Biol - A focus on performing ChIP assays to characterize histone modifications
Jung et al (2014). NAR. - Impact of sequencing depth in ChIP-seq experiments

Building on this workshop

These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.