# Create a vector of random numbers
random_numbers <- c(81, 90, 65, 43, 71, 29)Tidyverse data wrangling Answer Key
Exercise 1
Create a vector of random numbers using the code below:
Use the pipe (%>%) to perform two steps:
- Take the mean of
random_numbersusing themean()function.
# Return the mean of the random_numbers vector
random_numbers %>%
mean()[1] 63.16667
- Round the output to three digits using the
round()function.
# Return the mean of the random_numbers vector and round to three digits
random_numbers %>%
mean() %>%
round(digits = 3)[1] 63.167
Exercise 2
We would like to perform an additional round of filtering to only keep the most specific GO terms.
- For
bp_oe, use thefilter()function to only keep those rows where therelative.depthis greater than 4.
# Filter bp_oe to keep those rows where the relative.depth is greater than 4
bp_oe %>%
filter(relative.depth > 4)# A tibble: 668 × 14
query.number significant p.value term.size query.size overlap.size recall
<dbl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 TRUE 2.41e- 2 16 5850 11 0.002
2 1 TRUE 2.41e- 2 16 5850 11 0.002
3 1 TRUE 2.41e- 2 16 5850 11 0.002
4 1 TRUE 7.90e-11 2629 5850 973 0.166
5 1 TRUE 4.43e- 5 200 5850 93 0.016
6 1 TRUE 3.67e- 6 166 5850 83 0.014
7 1 TRUE 3.67e- 6 166 5850 83 0.014
8 1 TRUE 4.88e- 2 33 5850 18 0.003
9 1 TRUE 2.48e- 5 137 5850 69 0.012
10 1 TRUE 1.39e- 4 1492 5850 540 0.092
# ℹ 658 more rows
# ℹ 7 more variables: precision <dbl>, term.id <chr>, domain <chr>,
# subgraph.number <dbl>, term.name <chr>, relative.depth <dbl>,
# intersection <chr>
- Save output to overwrite our
bp_oeobject
# Filter bp_oe to keep those rows where the relative.depth is greater than 4 and overwrite the bp_oe object
bp_oe <- bp_oe %>%
filter(relative.depth > 4)
# Print object after filtering on the relative.depth column
bp_oe# A tibble: 668 × 14
query.number significant p.value term.size query.size overlap.size recall
<dbl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 TRUE 2.41e- 2 16 5850 11 0.002
2 1 TRUE 2.41e- 2 16 5850 11 0.002
3 1 TRUE 2.41e- 2 16 5850 11 0.002
4 1 TRUE 7.90e-11 2629 5850 973 0.166
5 1 TRUE 4.43e- 5 200 5850 93 0.016
6 1 TRUE 3.67e- 6 166 5850 83 0.014
7 1 TRUE 3.67e- 6 166 5850 83 0.014
8 1 TRUE 4.88e- 2 33 5850 18 0.003
9 1 TRUE 2.48e- 5 137 5850 69 0.012
10 1 TRUE 1.39e- 4 1492 5850 540 0.092
# ℹ 658 more rows
# ℹ 7 more variables: precision <dbl>, term.id <chr>, domain <chr>,
# subgraph.number <dbl>, term.name <chr>, relative.depth <dbl>,
# intersection <chr>
Exercise 3
Rename the intersection column to genes to reflect the fact that these are the DE genes associated with the GO process.
# Rename the interaction column of the bp_oe to be genes
bp_oe <- bp_oe %>%
dplyr::rename(genes = intersection)
# Print object after renaming the column
bp_oe# A tibble: 668 × 7
GO_id GO_term p.value query.size term.size overlap.size genes
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 GO:0010467 gene expression 6.71e-66 5850 5257 2142 gclc…
2 GO:0090304 nucleic acid met… 1.18e-61 5850 5103 2073 gclc…
3 GO:0006139 nucleobase-conta… 2.49e-58 5850 5731 2271 dpm1…
4 GO:0016070 RNA metabolic pr… 7.28e-57 5850 4597 1881 gclc…
5 GO:0009059 macromolecule bi… 3.12e-54 5850 5066 2030 dpm1…
6 GO:0034645 cellular macromo… 5.6 e-54 5850 4907 1975 dpm1…
7 GO:0044271 cellular nitroge… 2.10e-47 5850 4882 1938 gclc…
8 GO:0010468 regulation of ge… 4.25e-46 5850 4297 1733 gclc…
9 GO:2000112 regulation of ce… 1.22e-40 5850 3960 1593 gclc…
10 GO:0010556 regulation of ma… 2.22e-39 5850 4073 1626 gclc…
# ℹ 658 more rows
Exercise 4
Create a column in bp_oe called term_percent to determine the percent of DE genes associated with the GO term relative to the total number of genes associated with the GO term (overlap.size / term.size)
# Create term_percent column based on other columns in dataset
bp_oe <- bp_oe %>%
mutate(term_percent = overlap.size / term.size)
# Print object after creating the new column
bp_oe# A tibble: 668 × 9
GO_id GO_term p.value query.size term.size overlap.size genes gene_ratio
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 GO:00104… gene e… 6.71e-66 5850 5257 2142 gclc… 0.366
2 GO:00903… nuclei… 1.18e-61 5850 5103 2073 gclc… 0.354
3 GO:00061… nucleo… 2.49e-58 5850 5731 2271 dpm1… 0.388
4 GO:00160… RNA me… 7.28e-57 5850 4597 1881 gclc… 0.322
5 GO:00090… macrom… 3.12e-54 5850 5066 2030 dpm1… 0.347
6 GO:00346… cellul… 5.6 e-54 5850 4907 1975 dpm1… 0.338
7 GO:00442… cellul… 2.10e-47 5850 4882 1938 gclc… 0.331
8 GO:00104… regula… 4.25e-46 5850 4297 1733 gclc… 0.296
9 GO:20001… regula… 1.22e-40 5850 3960 1593 gclc… 0.272
10 GO:00105… regula… 2.22e-39 5850 4073 1626 gclc… 0.278
# ℹ 658 more rows
# ℹ 1 more variable: term_percent <dbl>