Skip to the content.

Needle in a Haystack: Finding and summarizing data from colossal files

Audience Computational skills required Duration
Biologists Beginner bash 2-3 hour workshop (~2-3 hours of trainer-led time)

Description

This repository has teaching materials for a 3 hour, hands-on Intermediate bash workshop led at a relaxed pace. Many tools for the analysis of big data require knowledge of the command line, and this workshop will build on the basic skills taught in the The Foundation - Basic Shell workshop to teach users basic command line functions such as grep, sed and awk to find and summarize information from large files.

Learning Objectives

These materials are developed for a trainer-led workshop, but are also amenable to self-guided learning.

Contents

Lessons Estimated Duration
Setting up 15 min
Regular Expressions 45 min
Sed 45 min
Awk 75 min

Dataset

Installation Requirements

Windows users: GitBash R