Data Wrangling: Dataframes, Matrices, and Lists Answer Key

Author

Will Gammerdinger

Published

July 1, 2025

Exercise 1

  1. Return a data frame with only the genotype and replicate column values for sample2 and sample8.
# Subset the metadata data frame to return the genotype and replicate columns and the sample2 and sample8 rows
metadata[c("sample2", "sample8"), c("genotype", "replicate")]
        genotype replicate
sample2       Wt         2
sample8       Wt         2
  1. Return the fourth and ninth values of the replicate column.
# Subset the metadata data frame to return the replicate column and the fourth and ninth rows
metadata[c(4, 9), "replicate"]
[1] 1 3
  1. Extract the replicate column as a data frame.
# Extract the replicate column from the metadata data frame, but retain the data frame structure
metadata[, "replicate", drop = FALSE]
         replicate
sample1          1
sample2          2
sample3          3
sample4          1
sample5          2
sample6          3
sample7          1
sample8          2
sample9          3
sample10         1
sample11         2
sample12         3

Exercise 2

Subset the metadata dataframe to return only the rows of data with a genotype of KO.

# Create an boolean index vector for elements in the genotype column of the metadata data frame that are not "KO" 
idx <- which(metadata$genotype=="KO")

# Subset the rows of the metadata data frame by the boolean index
metadata[idx, ]
         genotype celltype replicate
sample4        KO    typeA         1
sample5        KO    typeA         2
sample6        KO    typeA         3
sample10       KO    typeB         1
sample11       KO    typeB         2
sample12       KO    typeB         3

Alternatively, you can use a nested approach:

# Subset the rows of the metadata data frame by the elements in the genotype column of the metadata data frame that are not "KO" 
metadata[which(metadata$genotype=="KO"), ]
         genotype celltype replicate
sample4        KO    typeA         1
sample5        KO    typeA         2
sample6        KO    typeA         3
sample10       KO    typeB         1
sample11       KO    typeB         2
sample12       KO    typeB         3

Exercise 3

  1. Create a list named random with the following components: metadata, age, list1, samplegroup and number.
# Create a list called random composed of metadata, age, list1, samplegroup and number
random <- list(metadata, age, list1, samplegroup, number)

# Return the random list to the console for inspection
random
[[1]]
         genotype celltype replicate
sample1        Wt    typeA         1
sample2        Wt    typeA         2
sample3        Wt    typeA         3
sample4        KO    typeA         1
sample5        KO    typeA         2
sample6        KO    typeA         3
sample7        Wt    typeB         1
sample8        Wt    typeB         2
sample9        Wt    typeB         3
sample10       KO    typeB         1
sample11       KO    typeB         2
sample12       KO    typeB         3

[[2]]
[1] 15 22 45 52 73 81

[[3]]
[[3]][[1]]
[1] "ecoli" "human" "corn" 

[[3]][[2]]
  species glengths
1   ecoli      4.6
2   human   3000.0
3    corn  50000.0

[[3]][[3]]
[1] 9


[[4]]
[1] CTL CTL CTL KO  KO  KO  OE  OE  OE 
Levels: CTL KO OE

[[5]]
[1] 15
  1. Extract the samplegroup component.
# Extract the samplegroup object from the random list which is the fourth object in the list
random[[4]]
[1] CTL CTL CTL KO  KO  KO  OE  OE  OE 
Levels: CTL KO OE

Exercise 4

Let’s practice combining ways to extract data from the data structures we have covered so far:

  1. Set names for the random list you created in the last exercise.
# Set the names for the items in the random list
names(random) <- c("metadata", "age", "list1", "samplegroup", "number")
  1. Extract the age component using the $ notation
# Extract the age object from the random list
random$age
[1] 15 22 45 52 73 81