Skip to the content.

Bulk RNA-seq Data Analysis using High-Performance Computing (bulk RNA-seq Part I – FASTQ to counts)

Learning Objectives

Installations

All:

Mac users:

Windows users:

Notes

Cluster access

To run through the code in the lessons below, you will need a FAS-RC cluster account. Once you have an account, please do the following.

  1. Log in using ssh username@login.rc.fas.harvard.edu and enter your password (replace username with your username).
  2. Once you are on the login node, use salloc -p test -t 0-2:30 --mem 8G to get on a compute node or as specified in the lesson.
  3. Proceed only once your command prompt does not have the word login in it.
  4. If you log out between lessons (using the exit command twice), please follow points 1. and 2. above to log back in and get on a compute node when you restart with the self learning.

Lessons

Part 1

  1. Introduction to RNA-seq
  2. Shell basics review
  3. Working in an HPC environment - Review
  4. Best Practices in Research Data Management (RDM)
  5. Project Organization (using Data Management best practices)

Part II

  1. Quality Control of Sequence Data: Running FASTQC
  2. Experimental design considerations
  3. Quality Control of Sequence Data: Running FASTQC on multiple samples
  4. Quality Control of Sequence Data: Evaluating FASTQC reports

Part III

  1. Sequence Alignment Theory
  2. Quantifying expression using alignment-free methods (Salmon on multiple samples)

Part IV

  1. QC with Alignment Data
  2. Documenting Steps in the Workflow with MultiQC
  3. Troubleshooting RNA-seq Data Analysis

Part V

  1. Automating the RNA-seq workflow

Answer Keys


Building on this workshop


Resources


These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.