Needle in a Haystack: Finding and summarizing data from colossal files
| Audience | Computational skills required | Duration | 
|---|---|---|
| Biologists | Beginner bash | 2-3 hour workshop (~2-3 hours of trainer-led time) | 
Description
This repository has teaching materials for a 3 hour, hands-on Intermediate bash workshop led at a relaxed pace. Many tools for the analysis of big data require knowledge of the command line, and this workshop will build on the basic skills taught in the The Foundation - Basic Shell workshop to teach users basic command line functions such as grep, sed and awk to find and summarize information from large files.
Learning Objectives
- Recognize basic regex
- Utilize regex to cast a wider next with grep,sed, andawk
- Differentiate between best use cases for grep,sed, andawk
- Implement proper syntax for grep,sed, andawkcommands
- Observe the wide range of options for sedto perform various tasks
- Identify bioinformatic applications for grep,sed, andawk
These materials are developed for a trainer-led workshop, but are also amenable to self-guided learning.
Contents
| Lessons | Estimated Duration | 
|---|---|
| Setting up | 15 min | 
| Regular Expressions | 45 min | 
| Sed | 45 min | 
| Awk | 75 min |