Data Wrangling: Dataframes, Matrices, and Lists Answer Key
Author
Will Gammerdinger
Published
July 1, 2025
Exercise 1
Return a data frame with only the genotype and replicate column values for sample2 and sample8.
# Subset the metadata data frame to return the genotype and replicate columns and the sample2 and sample8 rowsmetadata[c("sample2", "sample8"), c("genotype", "replicate")]
genotype replicate
sample2 Wt 2
sample8 Wt 2
Return the fourth and ninth values of the replicate column.
# Subset the metadata data frame to return the replicate column and the fourth and ninth rowsmetadata[c(4, 9), "replicate"]
[1] 1 3
Extract the replicate column as a data frame.
# Extract the replicate column from the metadata data frame, but retain the data frame structuremetadata[, "replicate", drop =FALSE]
Subset the metadata dataframe to return only the rows of data with a genotype of KO.
# Create an boolean index vector for elements in the genotype column of the metadata data frame that are not "KO" idx <-which(metadata$genotype=="KO")# Subset the rows of the metadata data frame by the boolean indexmetadata[idx, ]
genotype celltype replicate
sample4 KO typeA 1
sample5 KO typeA 2
sample6 KO typeA 3
sample10 KO typeB 1
sample11 KO typeB 2
sample12 KO typeB 3
Alternatively, you can use a nested approach:
# Subset the rows of the metadata data frame by the elements in the genotype column of the metadata data frame that are not "KO" metadata[which(metadata$genotype=="KO"), ]
genotype celltype replicate
sample4 KO typeA 1
sample5 KO typeA 2
sample6 KO typeA 3
sample10 KO typeB 1
sample11 KO typeB 2
sample12 KO typeB 3
Exercise 3
Create a list named random with the following components: metadata, age, list1, samplegroup and number.
# Create a list called random composed of metadata, age, list1, samplegroup and numberrandom <-list(metadata, age, list1, samplegroup, number)# Return the random list to the console for inspectionrandom
[[1]]
genotype celltype replicate
sample1 Wt typeA 1
sample2 Wt typeA 2
sample3 Wt typeA 3
sample4 KO typeA 1
sample5 KO typeA 2
sample6 KO typeA 3
sample7 Wt typeB 1
sample8 Wt typeB 2
sample9 Wt typeB 3
sample10 KO typeB 1
sample11 KO typeB 2
sample12 KO typeB 3
[[2]]
[1] 15 22 45 52 73 81
[[3]]
[[3]][[1]]
[1] "ecoli" "human" "corn"
[[3]][[2]]
species glengths
1 ecoli 4.6
2 human 3000.0
3 corn 50000.0
[[3]][[3]]
[1] 9
[[4]]
[1] CTL CTL CTL KO KO KO OE OE OE
Levels: CTL KO OE
[[5]]
[1] 15
Extract the samplegroup component.
# Extract the samplegroup object from the random list which is the fourth object in the listrandom[[4]]
[1] CTL CTL CTL KO KO KO OE OE OE
Levels: CTL KO OE
Exercise 4
Let’s practice combining ways to extract data from the data structures we have covered so far:
Set names for the random list you created in the last exercise.
# Set the names for the items in the random listnames(random) <-c("metadata", "age", "list1", "samplegroup", "number")
Extract the age component using the $ notation
# Extract the age object from the random listrandom$age
[1] 15 22 45 52 73 81
Source Code
---title: "Data Wrangling: Dataframes, Matrices, and Lists Answer Key"author: - Will Gammerdingerdate: "2025-07-01"---```{r}#| label: load_data#| echo: false# Read in metadatametadata <-read.csv(file="data/mouse_exp_design.csv")# Read in list1list1 <-readRDS(file="data/list1.RDS")# Create age vectorage <-c(15, 22, 45, 52, 73, 81)# Create sample group factorsamplegroup <-c("CTL", "CTL", "CTL", "KO", "KO", "KO", "OE", "OE", "OE")samplegroup <-factor(samplegroup)# Create number variablenumber <-15```# Exercise 11. Return a data frame with only the `genotype` and `replicate` column values for `sample2` and `sample8`.```{r}#| label: subset_metadata_by_rows_columns# Subset the metadata data frame to return the genotype and replicate columns and the sample2 and sample8 rowsmetadata[c("sample2", "sample8"), c("genotype", "replicate")]```2. Return the fourth and ninth values of the `replicate` column.```{r}#| label: subset_metadata_by_rows_replicate# Subset the metadata data frame to return the replicate column and the fourth and ninth rowsmetadata[c(4, 9), "replicate"]```3. Extract the `replicate` column as a data frame.```{r}#| label: subset_metadata_replicate_drop_false# Extract the replicate column from the metadata data frame, but retain the data frame structuremetadata[, "replicate", drop =FALSE]```# Exercise 2Subset the `metadata` dataframe to return only the rows of data with a genotype of `KO`.```{r}#| label: subset_metadata_conditional# Create an boolean index vector for elements in the genotype column of the metadata data frame that are not "KO" idx <-which(metadata$genotype=="KO")# Subset the rows of the metadata data frame by the boolean indexmetadata[idx, ]```Alternatively, you can use a nested approach:```{r}#| label: nested_subset_metadata_conditional# Subset the rows of the metadata data frame by the elements in the genotype column of the metadata data frame that are not "KO" metadata[which(metadata$genotype=="KO"), ]```# Exercise 31. Create a list named `random` with the following components: `metadata`, `age`, `list1`, `samplegroup` and `number`.```{r}#| label: create_random_list# Create a list called random composed of metadata, age, list1, samplegroup and numberrandom <-list(metadata, age, list1, samplegroup, number)# Return the random list to the console for inspectionrandom```2. Extract the `samplegroup` component.```{r}#| label: subset_random_list# Extract the samplegroup object from the random list which is the fourth object in the listrandom[[4]]```# Exercise 4Let's practice combining ways to extract data from the data structures we have covered so far:1. Set names for the `random` list you created in the last exercise.```{r}#| label: name_random_list# Set the names for the items in the random listnames(random) <-c("metadata", "age", "list1", "samplegroup", "number")```2. Extract the `age` component using the `$` notation```{r}#| label: subset_by_name_random_list# Extract the age object from the random listrandom$age```