Needle in a Haystack: Finding and summarizing data from colossal files

Audience	Computational skills required	Duration
Biologists	Beginner bash	2-3 hour workshop (~2-3 hours of trainer-led time)

Description

This repository has teaching materials for a 3 hour, hands-on Intermediate bash workshop led at a relaxed pace. Many tools for the analysis of big data require knowledge of the command line, and this workshop will build on the basic skills taught in the The Foundation - Basic Shell workshop to teach users basic command line functions such as grep, sed and awk to find and summarize information from large files.

Learning Objectives

Recognize basic regex
Utilize regex to cast a wider next with grep, sed, and awk
Differentiate between best use cases for grep, sed, and awk
Implement proper syntax for grep, sed, and awk commands
Observe the wide range of options for sed to perform various tasks
Identify bioinformatic applications for grep, sed, and awk

These materials are developed for a trainer-led workshop, but are also amenable to self-guided learning.

Lessons	Estimated Duration
Setting up	15 min
Regular Expressions	45 min
Sed	45 min
Awk	75 min

Dataset

Installation Requirements

Windows users: GitBash R

Needle in a Haystack: Finding and summarizing data from colossal files

Materials for short, half-day workshops

Needle in a Haystack: Finding and summarizing data from colossal files

Description

Learning Objectives

Contents

Dataset

Installation Requirements