Workshop Schedule
Day 1¶
Lesson | Overview | Instructor | Time |
---|---|---|---|
Workshop Introduction | Welcome and housekeeping | Elizabeth | 10:00-10:30 |
Intro to R and RStudio | Introduction to R and RStudio | Noor | 10:30-11:45 |
Self learning materials | Overview of self-learning materials | Elizabeth | 11:45-12:00 |
Before the next class¶
A. Please study the contents and work through all the code within the following lessons.
B. Complete the exercises:
-
Each lesson above contains exercises; please go through each of them.
-
Copy over your solutions into the Google Form using the submit link below the day before the next class
Questions?
If you get stuck due to an error while running code in the lesson, email us
-
1. R Syntax and Data Structure
About data types and data structure
In order to utilize R effectively, you will need to understand what types of data you can use in R and also how you can store data in "objects" or "variables".
This lesson will cover:
-
Assigning a value to a object
-
What types of information can you store in R
-
What are the different objects that you can use to store data in R
-
-
Functions and Arguments in R
Functions are the basic "commands" used in R to get something done. To use functions (denoted by function_name followed by "()"), one has to enter some information within the parenthesis and optionally some arguments to change the default behavior of a function.
You can also create your own functions! When you want to perform a task or a series of tasks more than once, creating a custom function is the best way to go.
In this lesson you will explore:
-
Using built-in functions
-
Creating your own custom functions
-
-
3. Reading in and inspecting data
Read and inspect data structures in R
When using R, it is almost a certainty that you will have to bring data into the R environment.
In this lesson you will learn:
-
Reading different types (formats) of data
-
Inspecting the contents and structure of the dataset once you have read it in
-
-
Submit a day before the next class.
Day 2¶
Lesson | Overview | Instructor | Time |
---|---|---|---|
Review self-learning | Questions about self-learning | All | 10:00-10:50 |
In-class exercises | Use and customize function and arguments | Noor | 10:50-11:15 |
Data Wrangling | Subsetting Vectors and Factors | Will | 11:15-12:00 |
Before the next class¶
A. Please study the contents and work through all the code within the following lessons.
B. Complete the exercises:
-
Each lesson above contains exercises; please go through each of them.
-
Copy over your solutions into the Google Form using the submit link below the day before the next class
Questions?
If you get stuck due to an error while running code in the lesson, email us
-
Installing and loading packages in R
Base R is incredibly powerful, but it cannot do everything. R has been built to encourage community involvement in expanding functionality. Thousands of supplemental add-ons, also called "packages" have been contributed by the community. Each package comprises of several functions that enable users to perform their desired analysis.
This lesson will cover:
-
Descriptions of package repositories
-
Installing a package
-
Loading a package
-
Accessing the documention for your installed packages and getting help
-
-
2. Data wrangling: data frames, matrics and lists
Subset, merge, and create new datasets
In class we covered data wrangling (extracting/subsetting) information from single-dimensional objects (vectors, factors). The next step is to learn how to wrangle data in two-dimensional objects.
This lesson will cover:
-
Examining and extracting values from two-dimensional data structures using indices, row names, or column names
-
Retreiving information from lists
-
-
%in%
operator,any
andall
functionsVery often you will have to compare two vectors to figure out if, and which, values are common between them. The %in% operator can be used for this purpose.
This lesson will cover:
-
Implementing the %in% operator to evaluate two vectors
-
Distinguishing %in% from == and other logical operators
-
Using any() and all() functions
-
-
Ordering of vectors and data frames
Sometimes you will want to rearrange values within a vector (row names or column names). The match() function can be very powerful for this task.
This lesson will cover:
-
Maunually rearranging values within a vector
-
Implementing the match() function to automatically rearrange the values within a vector
-
-
Learn about
map()
function for iterative tasksWe will be starting with visualization in the next class. To set up for this, you need to create a new metadata data frame with information from the counts data frame. You will need to use a function over every column within the counts data frame iteratively. You could do that manually, but it is error-prone; the map() family of functions makes this more efficient.
This lesson will cover:
-
Utilizing map_dbl() to take the average of every column in a data frame
-
Briefly discuss other functions within the map() family of functions
-
Create a new data frame for plotting
-
-
Submit a day before the next class.
Prepare for in-class exercise:
- Download the data and place the file into the
data
directory.
Data | Download link |
---|---|
Animal data | Right click & Save link as... |
-
Read the .csv file into your environment and assign it to a variable called animals. Be sure to check that your row names are the different animals.
-
Save the R project when you close Rstudio.
Day 3¶
Lesson | Overview | Instructor | Time |
---|---|---|---|
Review self-learning | Questions about self-learning | All | 10:00-10:35 |
In-class exercises | Customizing functions and arguments | Will | 10:50-11:15 |
Plotting with ggplot2 | ggplot2 for data visualization | Noor | 11:15-12:00 |
Before the next class¶
-
Please study the contents and work through all the code within the following lessons.
-
Complete the exercises:
-
Each lesson above contains exercises; please go through each of them.
-
Copy over your solutions into the Google Form using the submit link below the day before the next class
Questions?
If you get stuck due to an error while running code in the lesson, email us
-
Consistent formats for plotting
When creating your plots in ggplot2 you may want to have consistent formatting (using theme() functions) across your plots, e.g. if you are generating plots for a manuscript.
This lesson will cover:
- Developing a custom function for creating consistently formatted plots
-
Customizing barplots with ggplot2
Previously, you created a scatterplot using ggplot2. However, ggplot2 can be used to create a very wide variety of plots. One of the other frequently used plots you can create with ggplot2 is a barplot.
This lesson will cover:
- Creating and customizing a barplot using ggplot2
-
Writing files and plots in different formats
Now that you have completed some analysis in R, you will need to eventually export that work out of R/RStudio. R provides lots of flexibility in what and how you export your data and plots.
This lesson will cover:
-
Exporting your figures from R using a variety of file formats
-
Writing your data from R to a file
-
-
How to best look for help
Hopefully, this course has given you the basic tools you need to be successful when using R. However, it would be impossible to cover every aspect of R and you will need to be able to troubleshoot future issues as they arise.
This lesson will cover:
-
Suggestions for how to best ask for help
-
Where to look for help
-
-
Data wrangling within Tidyverse
The Tidyverse suite of integrated packages are designed to work together to make common data science operations more user friendly. Tidyverse is becoming increasingly prevalent and it is necessary that R users are conversant in the basics of Tidyverse. We have already used two Tidyverse packages in this workshop (ggplot2 and purrr) and in this lesson we will learn some key features from a few additional packages that make up Tidyverse.
This lesson will cover:
-
Usage of pipes for connecting together multiple commands
-
Tibbles for two-dimensional data storage
-
Data wrangling within Tidyverse
-
-
Submit a day before the next class.
Day 4¶
Lesson | Overview | Instructor | Time |
---|---|---|---|
Review self-learning | Questions about self-learning | All | 10:00-10:35 |
In-class exercises | In class exercises | Will | 10:50-11:15 |
Discussion | Q&A | Noor | 11:15 - 11:45 |
Wrap Up | Wrap up and checking out | Noor | 11:45 - 12:00 |
Additional exercises and answer keys¶
Additional resources¶
-
Building on the basic R knowledge
-
Resources
Cheatsheets
Attribution & Citation
-
These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
-
Some materials used in these lessons were derived from work that is Copyright © Data Carpentry. All Data Carpentry instructional material is made available under the Creative Commons Attribution license (CC BY 4.0)
-
To cite material from this course in your publications, please use:
Meeta Mistry, Mary Piper, Jihe Liu, & Radhika Khetani. (2021, May 5). hbctraining/Intro-to-R-flipped: R workshop first release. Zenodo. https://doi.org/10.5281/zenodo.4739342
-
A lot of time and effort went into the preparation of these materials. Citations help us understand the needs of the community, gain recognition for our work, and attract further funding to support our teaching activities. Thank you for citing this material if it helped you in your data analysis.