Day 1 Homework Exercises
- The exercises below should be uploaded to the R_nanocourse_assignment#1 Dropbox folder by 5pm on Wednesday, April 29th.
- Add your solutions to the exercises in the downloaded .R file and upload the saved file, renamed with your initials/name to Dropbox.
- Specific questions regarding the homework that you would like to have reviewed in class can be asked here.
- Full attendance and submission of all assignments are required for course completion.
R syntax and data structures
-
Try changing the value of the variable
xto 5. What happens tonumber? -
Now try changing the value of variable
yto contain the value 10. What do you need to do, to update the variablenumber? -
Try to create a vector of numeric and character values by combining the two vectors that we just created (
glengthsandspecies). Assign this combined vector to a new variable calledcombined. Hint: you will need to use the combinec()function to do this. Print thecombinedvector in the console, what looks different compared to the original vectors? -
Let’s say that in our experimental analyses, we are working with three different sets of cells: normal, cells knocked out for geneA (a very exciting gene), and cells overexpressing geneA. We have three replicates for each celltype.
-
Create a vector named
samplegroupwith nine elements: 3 control (“CTL”) values, 3 knock-out (“KO”) values, and 3 over-expressing (“OE”) values. -
Turn
samplegroupinto a factor data structure.
-
-
Create a data frame called
favorite_bookswith the following vectors as columns:titles <- c("Catch-22", "Pride and Prejudice", "Nineteen Eighty Four") pages <- c(453, 432, 328) -
Create a list called
list2containingspecies,glengths, andnumber.
Functions and arguments
-
Let’s use base R function to calculate mean value of the
glengthsvector. You might need to search online to find what function can perform this task. - Create a new vector
test <- c(1, NA, 2, 3, NA, 4). Use the same base R function from exercise 1 (with addition of proper argument), and calculate mean value of thetestvector. The output should be2.5.NOTE: In R, missing values are represented by the symbol
NA(not available). It’s a way to make sure that users know they have missing data, and make a conscious decision on how to deal with it. There are ways to ignoreNAduring statistical calculations, or to removeNAfrom the vector. More information related to missing data can be found here. -
Another commonly used base function is
sort(). Use this function to sort theglengthsvector in descending order. - Write a function called
multiply_it, which takes two inputs: a numeric valuex, and a numeric valuey. The function will return the product of these two numeric values, which isx * y. For example,multiply_it(x=4, y=6)will return output24.
Reading in and inspecting data
-
Download this tab-delimited .txt file and save it in your project’s “data” folder.
- Read it in to R using
read.table()and store it as the variableproj_summary. As you useread.table(), keep in mind that:- all the columns in the input text file have column names
- you want the first column of the text file to be used as row names (hint: look up the row.names = argument
- Display the contents of
proj_summaryin your console
- Read it in to R using
-
Use the
class()function on theglengthsandmetadataobjects, how does the output differ between the two? - Use the
summary()function on theproj_summarydataframe- What is the median “rRNA_rate”?
- How many samples got the “low” level of treatment?
-
How long is the
samplegroupfactor? -
What are the dimensions of the
proj_summarydataframe? -
When you use the
rownames()function onmetadata, what is the data structure of the output? - [Optional] How many elements in (how long is) the output of
colnames(proj_summary)? Don’t count, but use another function to determine this.