Wrangling Practical
Reading in and inspecting data
- Read the
.csv
file into your environment and assign it to a variable calledanimals
. Be sure to check that your row names are the different animals. - Check to make sure that
animals
is a dataframe. - How many rows are in the
animals
dataframe? How many columns?
Data wrangling
- Extract the
speed
value of 40 km/h from theanimals
dataframe. - Return the rows with animals that are the
color
Tan. - Return the rows with animals that have
speed
greater than 50 km/h and output only thecolor
column. Keep the output as a data frame. - Change the color of “Grey” to “Gray”.
- Create a list called
animals_list
in which the first element contains the speed column of theanimals
dataframe and the second element contains the color column of theanimals
dataframe. - Give each element of your list the appropriate name (i.e speed and color).
The %in% operator, reordering and matching
-
Read in the project summary file (“project-summary.txt”) to a variable called
proj_summary
; this file contains quality metric information for an RNA-seq dataset. Be sure to specify the row names are in column 1 and the separator is a tab. - We have obtained batch information for the control samples in this dataset. Copy and paste the code below to create a dataframe of control samples with the associated batch information:
ctrl_samples <- data.frame(row.names = c("sample3", "sample10", "sample8", "sample4", "sample15"), date = c("01/13/2018", "03/15/2018", "01/13/2018", "09/20/2018","03/15/2018"))
- How many of the
ctrl_samples
are also in theproj_summary
dataframe? Use the %in% operator to compare sample names. - Keep only the rows in
proj_summary
which correspond to those inctrl_samples
. Do this with the %in% operator. Save it to a variable calledproj_summary_ctrl
. - We would like to add in the batch information for the samples in
proj_summary_ctrl
. Find the rows that match inctrl_samples
. - Use
cbind()
to add a column calledbatch
to theproj_summary_ctrl
dataframe. Assign this new dataframe back toproj_summary_ctrl
.
BONUS: Using map_lgl()
- Subset
proj_summary
to keep only the “high” and “low” samples based on the treament column. Save the new dataframe to a variable calledproj_summary_noctl
. - Further, subset the dataframe to remove the non-numeric columns “Quality_format”, and “treatment”. Try to do this using the
map_lgl()
function in addition tois.numeric()
. Save the new dataframe back toproj_summary_noctl
.