Day 1 Homework Exercises
- The exercises below should be uploaded to the R_nanocourse_assignment#1 Dropbox folder by 5pm on Wednesday, April 29th.
- Add your solutions to the exercises in the downloaded .R file and upload the saved file, renamed with your initials/name to Dropbox.
- Specific questions regarding the homework that you would like to have reviewed in class can be asked here.
- Full attendance and submission of all assignments are required for course completion.
R syntax and data structures
-
Try changing the value of the variable
x
to 5. What happens tonumber
? -
Now try changing the value of variable
y
to contain the value 10. What do you need to do, to update the variablenumber
? -
Try to create a vector of numeric and character values by combining the two vectors that we just created (
glengths
andspecies
). Assign this combined vector to a new variable calledcombined
. Hint: you will need to use the combinec()
function to do this. Print thecombined
vector in the console, what looks different compared to the original vectors? -
Let’s say that in our experimental analyses, we are working with three different sets of cells: normal, cells knocked out for geneA (a very exciting gene), and cells overexpressing geneA. We have three replicates for each celltype.
-
Create a vector named
samplegroup
with nine elements: 3 control (“CTL”) values, 3 knock-out (“KO”) values, and 3 over-expressing (“OE”) values. -
Turn
samplegroup
into a factor data structure.
-
-
Create a data frame called
favorite_books
with the following vectors as columns:titles <- c("Catch-22", "Pride and Prejudice", "Nineteen Eighty Four") pages <- c(453, 432, 328)
-
Create a list called
list2
containingspecies
,glengths
, andnumber
.
Functions and arguments
-
Let’s use base R function to calculate mean value of the
glengths
vector. You might need to search online to find what function can perform this task. - Create a new vector
test <- c(1, NA, 2, 3, NA, 4)
. Use the same base R function from exercise 1 (with addition of proper argument), and calculate mean value of thetest
vector. The output should be2.5
.NOTE: In R, missing values are represented by the symbol
NA
(not available). It’s a way to make sure that users know they have missing data, and make a conscious decision on how to deal with it. There are ways to ignoreNA
during statistical calculations, or to removeNA
from the vector. More information related to missing data can be found here. -
Another commonly used base function is
sort()
. Use this function to sort theglengths
vector in descending order. - Write a function called
multiply_it
, which takes two inputs: a numeric valuex
, and a numeric valuey
. The function will return the product of these two numeric values, which isx * y
. For example,multiply_it(x=4, y=6)
will return output24
.
Reading in and inspecting data
-
Download this tab-delimited .txt file and save it in your project’s “data” folder.
- Read it in to R using
read.table()
and store it as the variableproj_summary
. As you useread.table()
, keep in mind that:- all the columns in the input text file have column names
- you want the first column of the text file to be used as row names (hint: look up the row.names = argument
- Display the contents of
proj_summary
in your console
- Read it in to R using
-
Use the
class()
function on theglengths
andmetadata
objects, how does the output differ between the two? - Use the
summary()
function on theproj_summary
dataframe- What is the median “rRNA_rate”?
- How many samples got the “low” level of treatment?
-
How long is the
samplegroup
factor? -
What are the dimensions of the
proj_summary
dataframe? -
When you use the
rownames()
function onmetadata
, what is the data structure of the output? - [Optional] How many elements in (how long is) the output of
colnames(proj_summary)
? Don’t count, but use another function to determine this.