# Return a boolean vector of elements in B that are in A
B %in% A[1] FALSE FALSE FALSE FALSE TRUE TRUE
Will Gammerdinger
July 1, 2025
A and B vectors created above, evaluate each element in B to see if there is a match in A[1] FALSE FALSE FALSE FALSE TRUE TRUE
B vector to only return those values that are also in A.# Return a boolean vector of elements in B that are in A and assign it to the object intersectionBA
intersectionBA <- B %in% A
# Subset the B vector by the elements returning TRUE in intersectionBA
B[intersectionBA][1] 1 5
Alternatively, you can use a nested approach:
We have a list of 6 marker genes that we are very interested in. Our goal is to extract count data for these genes using the %in% operator from the rpkm_data data frame, instead of scrolling through rpkm_data and finding them manually.
First, let’s create a vector called important_genes with the Ensembl IDs of the 6 genes we are interested in:
%in% operator to determine if all of these genes are present in the row names of the rpkm_data data frame.rpkm_data that correspond to these 6 genes using [] and the %in% operator. Double check the row names to ensure that you are extracting the correct rows.# Extract important genes using the %in% operator
rpkm_data[which(rownames(rpkm_data) %in% important_genes),] sample2 sample5 sample7 sample8 sample9 sample4
ENSMUSG00000030970 2.221180 0.537852 2.243810 2.599400 3.593970 0.1753800
ENSMUSG00000047945 4.745070 0.323620 1.297810 3.896810 3.285470 0.2213430
ENSMUSG00000065619 0.000000 0.000000 0.000000 0.000000 0.000000 0.0000000
ENSMUSG00000080990 0.000000 0.000000 0.000000 0.000000 0.000000 0.0000000
ENSMUSG00000081010 0.222275 0.349415 0.190397 0.167166 0.221353 0.4196660
ENSMUSG00000083700 0.425214 0.337651 0.145973 0.142010 0.508757 0.0660419
sample6 sample12 sample3 sample11 sample10 sample1
ENSMUSG00000030970 0.435484 0.964169 2.151490 0.963523 1.014520 2.971420
ENSMUSG00000047945 0.478836 3.581640 4.501390 1.442970 0.982691 5.199470
ENSMUSG00000065619 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSMUSG00000080990 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSMUSG00000081010 0.248244 0.594672 0.214347 0.415823 0.452537 0.235848
ENSMUSG00000083700 0.308669 0.488064 0.136998 0.222865 0.205934 0.124225
rpkm_data that correspond to these 6 genes using [], but without using the %in% operator. sample2 sample5 sample7 sample8 sample9 sample4
ENSMUSG00000083700 0.425214 0.337651 0.145973 0.142010 0.508757 0.0660419
ENSMUSG00000080990 0.000000 0.000000 0.000000 0.000000 0.000000 0.0000000
ENSMUSG00000065619 0.000000 0.000000 0.000000 0.000000 0.000000 0.0000000
ENSMUSG00000047945 4.745070 0.323620 1.297810 3.896810 3.285470 0.2213430
ENSMUSG00000081010 0.222275 0.349415 0.190397 0.167166 0.221353 0.4196660
ENSMUSG00000030970 2.221180 0.537852 2.243810 2.599400 3.593970 0.1753800
sample6 sample12 sample3 sample11 sample10 sample1
ENSMUSG00000083700 0.308669 0.488064 0.136998 0.222865 0.205934 0.124225
ENSMUSG00000080990 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSMUSG00000065619 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
ENSMUSG00000047945 0.478836 3.581640 4.501390 1.442970 0.982691 5.199470
ENSMUSG00000081010 0.248244 0.594672 0.214347 0.415823 0.452537 0.235848
ENSMUSG00000030970 0.435484 0.964169 2.151490 0.963523 1.014520 2.971420
---
title: "The %in% operator Answer Key"
author:
- Will Gammerdinger
date: "2025-07-01"
---
```{r}
#| label: load_data
#| echo: false
# Set A and B vectors
A <- c(1,3,5,7,9,11) # odd numbers
B <- c(2,4,6,8,1,5) # add some odd numbers in
# Read in the expression data
rpkm_data <- read.csv("data/counts.rpkm.csv")
```
# Exercise 1
1. Using the `A` and `B` vectors created above, evaluate each element in `B` to see if there is a match in `A`
```{r}
#| label: find_b_in_a
# Return a boolean vector of elements in B that are in A
B %in% A
```
2. Subset the `B` vector to only return those values that are also in `A`.
```{r}
#| label: subset_b_in_a
# Return a boolean vector of elements in B that are in A and assign it to the object intersectionBA
intersectionBA <- B %in% A
# Subset the B vector by the elements returning TRUE in intersectionBA
B[intersectionBA]
```
Alternatively, you can use a nested approach:
```{r}
#| label: nested_subset_b_in_a
# Identify the elements in B that are in A and subset the B vector by those elements
B[B %in% A]
```
# Exercise 2
We have a list of 6 marker genes that we are very interested in. Our goal is to extract count data for these genes using the `%in%` operator from the `rpkm_data` data frame, instead of scrolling through `rpkm_data` and finding them manually.
First, let's create a vector called `important_genes` with the Ensembl IDs of the 6 genes we are interested in:
```{r}
#| label: create_important_genes
# Create important genes vector
important_genes <- c("ENSMUSG00000083700", "ENSMUSG00000080990", "ENSMUSG00000065619", "ENSMUSG00000047945", "ENSMUSG00000081010", "ENSMUSG00000030970")
```
1. Use the `%in%` operator to determine if all of these genes are present in the row names of the `rpkm_data` data frame.
```{r}
#| label: important_genes_in_operator
all(important_genes %in% rownames(rpkm_data))
```
2. Extract the rows from `rpkm_data` that correspond to these 6 genes using `[]` and the `%in%` operator. Double check the row names to ensure that you are extracting the correct rows.
```{r}
#| label: extract_important_genes_in_operator
# Extract important genes using the %in% operator
rpkm_data[which(rownames(rpkm_data) %in% important_genes),]
```
3. **Bonus question:** Extract the rows from `rpkm_data` that correspond to these 6 genes using `[]`, but without using the `%in%` operator.
```{r}
#| label: extract_important_genes_without_in_operator
# Extract important genes without using the %in% operator
rpkm_data[important_genes, ]
```