Day 3: In class activities¶
Reading in and inspecting data¶
- Download the data and place the file into the
data
directory.
Data | Download link |
---|---|
Animal data | Right click & Save link as... |
-
Read the
.csv
file into your environment and assign it to a variable calledanimals
. Be sure to check that your row names are the different animals. -
Check to make sure that
animals
is a dataframe. -
How many rows are in the
animals
dataframe? How many columns?
Data wrangling¶
-
Extract the
speed
value of 40 km/h from theanimals
dataframe. -
Return the rows with animals that are the
color
Tan. -
Return the rows with animals that have
speed
greater than 50 km/h and output only thecolor
column. Keep the output as a data frame. -
Change the color of "Grey" to "Gray".
-
Create a list called
animals_list
in which the first element contains the speed column of theanimals
dataframe and the second element contains the color column of theanimals
dataframe. -
Give each element of your list the appropriate name (i.e speed and color).
The %in%
operator, reordering and matching¶
In your environment you should have a dataframe called proj_summary
which contains quality metric information for an RNA-seq dataset. We have obtained batch information for the control samples in this dataset.
- Copy and paste the code below to create a dataframe of control samples with the associated batch information
ctrl_samples <- data.frame(row.names = c("sample3", "sample10", "sample8", "sample4", "sample15"), date = c("01/13/2018", "03/15/2018", "01/13/2018", "09/20/2018","03/15/2018"))
-
How many of the
ctrl_samples
are also in theproj_summary
dataframe? Use the%in%
operator to compare sample names. -
Keep only the rows in
proj_summary
which correspond to those inctrl_samples
. Do this with the %in% operator. Save it to a variable calledproj_summary_ctrl
. -
We would like to add in the batch information for the samples in
proj_summary_ctrl
. Find the rows that match inctrl_samples
. -
Use
cbind()
to add a column calledbatch
to theproj_summary_ctrl
dataframe. Assign this new dataframe back toproj_summary_ctrl
.
BONUS: Using map_lgl()
¶
-
Subset
proj_summary
to keep only the “high” and “low” samples based on the treament column. Save the new dataframe to a variable calledproj_summary_noctl
. -
Further, subset the dataframe to remove the non-numeric columns “Quality_format”, and “treatment”. Try to do this using the
map_lgl()
function in addition tois.numeric()
. Save the new dataframe back toproj_summary_noctl
.
Attribution notice
-
This lesson has been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
-
The materials used in this lesson are adapted from work that is Copyright © Data Carpentry (http://datacarpentry.org/).
-
All Data Carpentry instructional material is made available under the Creative Commons Attribution license (CC BY 4.0).