Workshop Schedule
Pre-requisite for this workshop: The Basic Data Skills Introduction to the command-line interface workshop or a working knowledge of the command line and cluster computing.
Pre-reading
Day 1
Time | Topic | Instructor |
---|---|---|
09:30 - 09:45 | Workshop Introduction | Meeta |
09:45 - 10:25 | Working in an HPC environment - Review | Emma |
10:25 - 11:05 | Project Organization (using Data Management best practices) | Meeta |
11:05 - 11:45 | Quality Control of Sequence Data: Running FASTQC | Emma |
11:45 - 12:00 | Overview of self-learning materials and homework submission | Meeta |
Before the next class:
- Please study the contents and work through all the code within the following lessons:
- Experimental design considerations
- Quality Control of Sequence Data: Running FASTQC on multiple samples
-
Quality Control of Sequence Data: Evaluating FASTQC reports
NOTE: To run through the code above, you will need to be logged into O2 and working on a compute node (i.e. your command prompt should have the word
compute
in it).- Log in using
ssh rc_trainingXX@o2.hms.harvard.edu
and enter your password (replace the “XX” in the username with the number you were assigned in class). - Once you are on the login node, use
srun --pty -p interactive -t 0-2:30 --mem 1G /bin/bash
to get on a compute node or as specified in the lesson. - Proceed only once your command prompt has the word
compute
in it. - If you log out between lessons (using the
exit
command twice), please follow points 1. and 2. above to log back in and get on a compute node when you restart with the self learning.
- Log in using
- Complete the exercises:
- Each lesson above contain exercises; please go through each of them.
- Add your answers to the questions to Google forms the day before the next class.
Questions?
- If you get stuck due to an error while runnning code in the lesson, email us
Day 2
Time | Topic | Instructor |
---|---|---|
09:30 - 10:30 | Self-learning lessons review | All |
10:30 - 11:10 | Expression quantification: Theory and Tools | Meeta |
11:10 - 11:50 | Quantifying expression using alignment-free methods (Salmon) | Emma |
11:50 - 12:00 | Review of workflow | Emma |
Before the next class:
- Please study the contents and work through all the code within the following lessons:
- Quantifying expression using alignment-free methods (Salmon on multiple samples)
Click here for a preview of this lesson
Now that we know how to run the quantification of one sample with Salmon, this lesson will guide you to run multiple samples by creating a job submission script
- QC with Alignment Data
Click here for a preview of this lesson
Besides transcript-level quantification, we also want to understand the quality of the mapping, which is not provided in Salmon output.
This lesson will cover:
- Aligning the reads with an aligner, STAR
- Assessing QC metrics among samples
- Documenting Steps in the Workflow with MultiQC
Click here for a preview of this lesson
It would be great to have a summary document of all QC results from the previous analysis.
This lesson will cover:
- Generating such a summary report with multiQC
- Generating alignment metric with Qualimap
NOTE: To run through the code above, you will need to be logged into O2 and working on a compute node (i.e. your command prompt should have the word
compute
in it).- Log in using
ssh rc_trainingXX@o2.hms.harvard.edu
and enter your password (replace the “XX” in the username with the number you were assigned in class). - Once you are on the login node, use
srun --pty -p interactive -t 0-2:30 --mem 8G /bin/bash
to get on a compute node or as specified in the lesson. - Proceed only once your command prompt has the word
compute
in it. - If you log out between lessons (using the
exit
command twice), please follow points 1. and 2. above to log back in and get on a compute node when you restart with the self learning.
- Log in using
- Complete the exercises:
- Each lesson above contain exercises; please go through each of them.
- Add your answers to the questions to Google forms the day before the next class.
Questions?
- If you get stuck due to an error while runnning code in the lesson, email us
Day 3
Time | Topic | Instructor |
---|---|---|
09:30 - 10:10 | Self-learning lessons review | All |
10:10 - 11:10 | Automating the RNA-seq workflow | Will |
11:10 - 11:45 | Troubleshooting RNA-seq Data Analysis | Emma |
11:45 - 12:00 | Wrap up | Will |
- Downloadable Answer Keys (Day 2 exercises):
- Downloadable Answer Keys (Day 3 exercises):
- Automation Script
Resources
- Getting an O2 account
- Video about statistics behind salmon quantification
- Advanced bash for working on O2:
- Obtaining reference genomes or transcriptomes
- Youtube videos
Building on this workshop
- Introduction to R workshop materials
- Introduction to Differential Gene Expression analysis (bulk RNA-seq) workshop materials
These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.