The %in% operator Answer Key

Author

Will Gammerdinger

Published

July 1, 2025

Exercise 1

Using the A and B vectors created above, evaluate each element in B to see if there is a match in A

# Return a boolean vector of elements in B that are in A
B %in% A

[1] FALSE FALSE FALSE FALSE  TRUE  TRUE

Subset the B vector to only return those values that are also in A.

# Return a boolean vector of elements in B that are in A and assign it to the object intersectionBA
intersectionBA <- B %in% A

# Subset the B vector by the elements returning TRUE in intersectionBA
B[intersectionBA]

[1] 1 5

Alternatively, you can use a nested approach:

# Identify the elements in B that are in A and subset the B vector by those elements
B[B %in% A]

[1] 1 5

Exercise 2

We have a list of 6 marker genes that we are very interested in. Our goal is to extract count data for these genes using the %in% operator from the rpkm_data data frame, instead of scrolling through rpkm_data and finding them manually.

First, let’s create a vector called important_genes with the Ensembl IDs of the 6 genes we are interested in:

# Create important genes vector
important_genes <- c("ENSMUSG00000083700", "ENSMUSG00000080990", "ENSMUSG00000065619", "ENSMUSG00000047945", "ENSMUSG00000081010", "ENSMUSG00000030970")

Use the %in% operator to determine if all of these genes are present in the row names of the rpkm_data data frame.

all(important_genes %in% rownames(rpkm_data))

[1] TRUE

Extract the rows from rpkm_data that correspond to these 6 genes using [] and the %in% operator. Double check the row names to ensure that you are extracting the correct rows.

# Extract important genes using the %in% operator
rpkm_data[which(rownames(rpkm_data) %in% important_genes),]

                    sample2  sample5  sample7  sample8  sample9   sample4
ENSMUSG00000030970 2.221180 0.537852 2.243810 2.599400 3.593970 0.1753800
ENSMUSG00000047945 4.745070 0.323620 1.297810 3.896810 3.285470 0.2213430
ENSMUSG00000065619 0.000000 0.000000 0.000000 0.000000 0.000000 0.0000000
ENSMUSG00000080990 0.000000 0.000000 0.000000 0.000000 0.000000 0.0000000
ENSMUSG00000081010 0.222275 0.349415 0.190397 0.167166 0.221353 0.4196660
ENSMUSG00000083700 0.425214 0.337651 0.145973 0.142010 0.508757 0.0660419
                    sample6 sample12  sample3 sample11 sample10  sample1
ENSMUSG00000030970 0.435484 0.964169 2.151490 0.963523 1.014520 2.971420
ENSMUSG00000047945 0.478836 3.581640 4.501390 1.442970 0.982691 5.199470
ENSMUSG00000065619 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSMUSG00000080990 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSMUSG00000081010 0.248244 0.594672 0.214347 0.415823 0.452537 0.235848
ENSMUSG00000083700 0.308669 0.488064 0.136998 0.222865 0.205934 0.124225

Bonus question: Extract the rows from rpkm_data that correspond to these 6 genes using [], but without using the %in% operator.

# Extract important genes without using the %in% operator
rpkm_data[important_genes, ]

                    sample2  sample5  sample7  sample8  sample9   sample4
ENSMUSG00000083700 0.425214 0.337651 0.145973 0.142010 0.508757 0.0660419
ENSMUSG00000080990 0.000000 0.000000 0.000000 0.000000 0.000000 0.0000000
ENSMUSG00000065619 0.000000 0.000000 0.000000 0.000000 0.000000 0.0000000
ENSMUSG00000047945 4.745070 0.323620 1.297810 3.896810 3.285470 0.2213430
ENSMUSG00000081010 0.222275 0.349415 0.190397 0.167166 0.221353 0.4196660
ENSMUSG00000030970 2.221180 0.537852 2.243810 2.599400 3.593970 0.1753800
                    sample6 sample12  sample3 sample11 sample10  sample1
ENSMUSG00000083700 0.308669 0.488064 0.136998 0.222865 0.205934 0.124225
ENSMUSG00000080990 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSMUSG00000065619 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSMUSG00000047945 0.478836 3.581640 4.501390 1.442970 0.982691 5.199470
ENSMUSG00000081010 0.248244 0.594672 0.214347 0.415823 0.452537 0.235848
ENSMUSG00000030970 0.435484 0.964169 2.151490 0.963523 1.014520 2.971420