Needle in a Haystack: Finding and summarizing data from colossal files
| Audience | Computational skills required | Duration |
|---|---|---|
| Biologists | Beginner bash | 2-3 hour workshop (~2-3 hours of trainer-led time) |
Description
This repository has teaching materials for a 3 hour, hands-on Intermediate bash workshop led at a relaxed pace. Many tools for the analysis of big data require knowledge of the command line, and this workshop will build on the basic skills taught in the The Foundation - Basic Shell workshop to teach users basic command line functions such as grep, sed and awk to find and summarize information from large files.
Learning Objectives
- Recognize basic regex
- Utilize regex to cast a wider next with
grep,sed, andawk - Differentiate between best use cases for
grep,sed, andawk - Implement proper syntax for
grep,sed, andawkcommands - Observe the wide range of options for
sedto perform various tasks - Identify bioinformatic applications for
grep,sed, andawk
These materials are developed for a trainer-led workshop, but are also amenable to self-guided learning.
Contents
| Lessons | Estimated Duration |
|---|---|
| Setting up | 15 min |
| Regular Expressions | 45 min |
| Sed | 45 min |
| Awk | 75 min |