THIS REPO IS ARCHIVED, PLEASE GO TO https://hbctraining.github.io/main FOR CURRENT LESSONS.

Introduction to RNA-seq using high-performance computing (HPC)

Audience Computational Skills Prerequisites Duration
Biologists Beginner/Intermediate None 2-day workshop (~13 hours of trainer-led time)

Description

This repository has teaching materials for a 2-day Introduction to RNA-sequencing data analysis workshop. This workshop focuses on teaching basic computational skills to enable the effective use of an high-performance computing environment to implement an RNA-seq data analysis workflow. It includes an introduction to shell (bash) and shell scripting. In addition to running the RNA-seq workflow from FASTQ files to count data, the workshop covers best practice guidlelines for RNA-seq experimental design and data organization/management.

These materials were developed for a trainer-led workshop, but are also amenable to self-guided learning.

Learning Objectives

  1. Understand the necessity for, and use of, the command line interface (bash) and HPC for analyzing high-throughput sequencing data.
  2. Understand best practices for designing an RNA-seq experiment and analysis the resulting data.

Contents

Lessons Estimated Duration
Introduction to the shell 70 min
Searching and redirection in shell 45 min
Shell scripts and for loops 75 min
Permissions and environment variables 50 min
Project and data organization 40 min
RNA-seq experimental design best practices 50 min
Introduction to High-Performance Computing for HMS-RC’s Orchestra 45 min
RNA-seq data QC with FastQC 75 min
RNA-seq workflow: Alignment and Counting 90 min
Automating the RNA-seq workflow 60 min
Alternative workflows for analyzing RNA-seq data 15 min
Quantifying expression using alignment-free methods (Salmon) 75 min

Dataset

The dataset used in this workshop can be downloaded here.


These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.