<- read.csv("../../data/animals.csv")
animals animals
speed color
Elephant 40.0 Gray
Cheetah 120.0 Tan
Tortoise 0.1 Green
Hare 48.0 Grey
Lion 80.0 Tan
PolarBear 30.0 White
Mary Piper, Meeta Mistry, Radhika Khetani
December 4, 2019
data
directory..csv
file into your environment and assign it to a variable called animals
. Be sure to check that your row names are the different animals. speed color
Elephant 40.0 Gray
Cheetah 120.0 Tan
Tortoise 0.1 Green
Hare 48.0 Grey
Lion 80.0 Tan
PolarBear 30.0 White
animals
is a dataframe.animals
dataframe? How many columns?speed
value of 40 km/h from the animals
dataframe.[1] 40
[1] 40
[1] 40
[1] 40
color
Tan. speed color
Cheetah 120 Tan
Lion 80 Tan
speed color
Cheetah 120 Tan
Lion 80 Tan
speed
greater than 50 km/h and output only the color
column. Keep the output as a data frame.animals_list
in which the first element contains the speed column of the animals
dataframe and the second element contains the color column of the animals
dataframe.proj_summary
which contains quality metric information for an RNA-seq dataset. We have obtained batch information for the control samples in this dataset. Copy and paste the code below to create a dataframe of control samples with the associated batch information:ctrl_samples
are also in the proj_summary
dataframe? Use the %in% operator to compare sample names.proj_summary
which correspond to those in ctrl_samples
. Do this with the %in% operator. Save it to a variable called proj_summary_ctrl
.proj_summary_ctrl <- proj_summary[which(rownames(proj_summary) %in% rownames(ctrl_samples)),]
proj_summary_ctrl
percent_GC Exonic_Rate Intronic_Rate Intergenic_Rate Mapping_Rate
sample3 50 0.8834 0.0663 0.0503 0.9877286
sample4 50 0.9027 0.0649 0.0325 0.9870764
sample8 49 0.9022 0.0656 0.0322 0.9877458
Quality_format rRNA_rate treatment
sample3 standard 0.026944958 control
sample4 standard 0.005081974 control
sample8 standard 0.004549047 control
proj_summary_ctrl
. Find the rows that match in ctrl_samples
.cbind()
to add a column called batch
to the proj_summary_ctrl
dataframe. Assign this new dataframe back to proj_summary_ctrl
. percent_GC Exonic_Rate Intronic_Rate Intergenic_Rate Mapping_Rate
sample3 50 0.8834 0.0663 0.0503 0.9877286
sample4 50 0.9027 0.0649 0.0325 0.9870764
sample8 49 0.9022 0.0656 0.0322 0.9877458
Quality_format rRNA_rate treatment batch
sample3 standard 0.026944958 control 01/13/2018
sample4 standard 0.005081974 control 09/20/2018
sample8 standard 0.004549047 control 01/13/2018
map_lgl()
proj_summary
to keep only the “high” and “low” samples based on the treament column. Save the new dataframe to a variable called proj_summary_noctl
.library(purrr)
proj_summary_noctl <- proj_summary[which(proj_summary$treatment != "control"),]
proj_summary_noctl
percent_GC Exonic_Rate Intronic_Rate Intergenic_Rate Mapping_Rate
sample1 49 0.8913 0.0709 0.0378 0.9787998
sample2 49 0.9055 0.0625 0.0321 0.9825069
sample5 49 0.8923 0.0714 0.0362 0.9781835
sample6 49 0.8999 0.0667 0.0334 0.9772096
sample7 49 0.8983 0.0665 0.0352 0.9757997
sample9 49 0.9111 0.0566 0.0323 0.9814494
Quality_format rRNA_rate treatment
sample1 standard 0.007264734 high
sample2 standard 0.005518317 low
sample5 standard 0.005023175 high
sample6 standard 0.005345113 low
sample7 standard 0.005240401 high
sample9 standard 0.005817519 low
map_lgl()
function in addition to is.numeric()
. Save the new dataframe back to proj_summary_noctl
.keep <- map_lgl(proj_summary_noctl, is.numeric)
proj_summary_noctl <- proj_summary_noctl[,keep]
proj_summary_noctl
percent_GC Exonic_Rate Intronic_Rate Intergenic_Rate Mapping_Rate
sample1 49 0.8913 0.0709 0.0378 0.9787998
sample2 49 0.9055 0.0625 0.0321 0.9825069
sample5 49 0.8923 0.0714 0.0362 0.9781835
sample6 49 0.8999 0.0667 0.0334 0.9772096
sample7 49 0.8983 0.0665 0.0352 0.9757997
sample9 49 0.9111 0.0566 0.0323 0.9814494
rRNA_rate
sample1 0.007264734
sample2 0.005518317
sample5 0.005023175
sample6 0.005345113
sample7 0.005240401
sample9 0.005817519
---
title: "Day 3 Activities Answer Key"
author: "Mary Piper, Meeta Mistry, Radhika Khetani"
date: "Wednesday, December 4, 2019"
---
# Exercises
## Reading in and inspecting data
1. Download the [animals.csv](https://raw.githubusercontent.com/hbctraining/Intro-to-R-flipped/master/data/animals.csv), by right-clicking on the link and "Save Link As..." to place the file into the `data` directory.
2. Read the `.csv` file into your environment and assign it to a variable called `animals`. **Be sure to check that your row names are the different animals.**
```{r}
animals <- read.csv("../../data/animals.csv")
animals
```
3. Check to make sure that `animals` is a dataframe.
```{r}
class(animals)
```
4. How many rows are in the `animals` dataframe? How many columns?
```{r}
nrow(animals)
ncol(animals)
```
## Data wrangling
1. Extract the `speed` value of 40 km/h from the `animals` dataframe.
```{r}
animals[1,1]
animals[which(animals$speed == 40), 1]
animals[which(animals$speed == 40), "speed"]
animals$speed[which(animals$speed == 40)]
```
2. Return the rows with animals that are the `color` Tan.
```{r}
animals[c(2,5),]
animals[which(animals$color == "Tan"),]
```
3. Return the rows with animals that have `speed` greater than 50 km/h and output only the `color` column. Keep the output as a data frame.
```{r}
animals[which(animals$speed > 50), "color", drop =F]
```
4. Change the color of "Grey" to "Gray".
```{r}
animals$color[which(animals$color == "Grey")] <- "Gray"
animals[which(animals$color == "Grey"), "color"] <- "Gray"
```
5. Create a list called `animals_list` in which the first element contains the speed column of the `animals` dataframe and the second element contains the color column of the `animals` dataframe.
```{r}
animals_list <- list(animals$speed, animals$color)
```
6. Give each element of your list the appropriate name (i.e speed and color).
```{r}
names(animals_list) <- colnames(animals)
```
## The %in% operator, reordering and matching
1. In your environment you should have a dataframe called `proj_summary` which contains quality metric information for an RNA-seq dataset. We have obtained batch information for the control samples in this dataset. **Copy and paste the code below to create a dataframe of control samples with the associated batch information**:
```{r}
proj_summary <- read.table(file = "../../data/project-summary.txt", header = TRUE, row.names = 1)
ctrl_samples <- data.frame(row.names = c("sample3", "sample10", "sample8", "sample4", "sample15"),
date = c("01/13/2018", "03/15/2018", "01/13/2018", "09/20/2018","03/15/2018"))
```
2. How many of the `ctrl_samples` are also in the `proj_summary` dataframe? Use the %in% operator to compare sample names.
```{r}
length(which(rownames(ctrl_samples) %in% rownames(proj_summary)))
```
3. Keep only the rows in `proj_summary` which correspond to those in `ctrl_samples`. Do this with the %in% operator. Save it to a variable called `proj_summary_ctrl`.
```{r}
proj_summary_ctrl <- proj_summary[which(rownames(proj_summary) %in% rownames(ctrl_samples)),]
proj_summary_ctrl
```
4. We would like to add in the batch information for the samples in `proj_summary_ctrl`. Find the rows that match in `ctrl_samples`.
```{r}
m <- match(rownames(proj_summary_ctrl), rownames(ctrl_samples))
m
```
5. Use `cbind()` to add a column called `batch` to the `proj_summary_ctrl` dataframe. Assign this new dataframe back to `proj_summary_ctrl`.
```{r}
proj_summary_ctrl <- cbind(proj_summary_ctrl, batch=ctrl_samples[m,])
proj_summary_ctrl
```
## BONUS: Using `map_lgl()`
1. Subset `proj_summary` to keep only the "high" and "low" samples based on the treament column. Save the new dataframe to a variable called `proj_summary_noctl`.
```{r}
library(purrr)
proj_summary_noctl <- proj_summary[which(proj_summary$treatment != "control"),]
proj_summary_noctl
```
2. Further, subset the dataframe to remove the non-numeric columns "Quality_format", and "treatment". Try to do this using the `map_lgl()` function in addition to `is.numeric()`. Save the new dataframe back to `proj_summary_noctl`.
```{r}
keep <- map_lgl(proj_summary_noctl, is.numeric)
proj_summary_noctl <- proj_summary_noctl[,keep]
proj_summary_noctl
```