Needle in a Haystack: Finding and summarizing data from colossal files
Audience | Computational skills required | Duration |
---|---|---|
Biologists | Beginner bash | 2-3 hour workshop (~2-3 hours of trainer-led time) |
Description
This repository has teaching materials for a 3 hour, hands-on Intermediate bash workshop led at a relaxed pace. Many tools for the analysis of big data require knowledge of the command line, and this workshop will build on the basic skills taught in the The Foundation - Basic Shell workshop to teach users basic command line functions such as grep
, sed
and awk
to find and summarize information from large files.
Learning Objectives
- Recognize basic regex
- Utilize regex to cast a wider next with
grep
,sed
, andawk
- Differentiate between best use cases for
grep
,sed
, andawk
- Implement proper syntax for
grep
,sed
, andawk
commands - Observe the wide range of options for
sed
to perform various tasks - Identify bioinformatic applications for
grep
,sed
, andawk
These materials are developed for a trainer-led workshop, but are also amenable to self-guided learning.
Contents
Lessons | Estimated Duration |
---|---|
Setting up | 15 min |
Regular Expressions | 45 min |
Sed | 45 min |
Awk | 75 min |