# Read in project-summary.txt from the data directory and let R know there is a header and that you want the first column to be converted to the rownames then assign it to an object called proj_summary
proj_summary <- read.table("data/project-summary.txt", header = TRUE, row.names = 1)Reading in and inspecting data Answer Key
Exercise 1
- Inside your project’s
datafolder you should see a file calledproject-summary.txt. Read it in to R usingread.table()with the appropriate arguments and store it as the variableproj_summary. To figure out the appropriate arguments to use withread.table(), keep the following in mind:- all the columns in the input text file have column name/headers
- you want the first column of the text file to be used as row names (hint: look up the input for the
row.names =argument inread.table())
- Display the contents of
proj_summaryin your console
# Display the contents of proj_summary
proj_summary percent_GC Exonic_Rate Intronic_Rate Intergenic_Rate Mapping_Rate
sample1 49 0.8913 0.0709 0.0378 0.9787998
sample2 49 0.9055 0.0625 0.0321 0.9825069
sample3 50 0.8834 0.0663 0.0503 0.9877286
sample4 50 0.9027 0.0649 0.0325 0.9870764
sample5 49 0.8923 0.0714 0.0362 0.9781835
sample6 49 0.8999 0.0667 0.0334 0.9772096
sample7 49 0.8983 0.0665 0.0352 0.9757997
sample8 49 0.9022 0.0656 0.0322 0.9877458
sample9 49 0.9111 0.0566 0.0323 0.9814494
Quality_format rRNA_rate treatment
sample1 standard 0.007264734 high
sample2 standard 0.005518317 low
sample3 standard 0.026944958 control
sample4 standard 0.005081974 control
sample5 standard 0.005023175 high
sample6 standard 0.005345113 low
sample7 standard 0.005240401 high
sample8 standard 0.004549047 control
sample9 standard 0.005817519 low
Exercise 2
- Use the
class()function onglengthsandmetadata, how does the output differ between the two?
# Return the class of glengths
class(glengths)[1] "numeric"
# Return the class of metadata
class(metadata)[1] "data.frame"
glengths is a numeric vector and metadata is a data frame.
- Use the
summary()function on theproj_summarydataframe, what is the median “rRNA_rate”?
# Provide a summary of the proj_summary object
summary(proj_summary) percent_GC Exonic_Rate Intronic_Rate Intergenic_Rate
Min. :49.00 Min. :0.8834 Min. :0.05660 Min. :0.03210
1st Qu.:49.00 1st Qu.:0.8923 1st Qu.:0.06490 1st Qu.:0.03230
Median :49.00 Median :0.8999 Median :0.06630 Median :0.03340
Mean :49.22 Mean :0.8985 Mean :0.06571 Mean :0.03578
3rd Qu.:49.00 3rd Qu.:0.9027 3rd Qu.:0.06670 3rd Qu.:0.03620
Max. :50.00 Max. :0.9111 Max. :0.07140 Max. :0.05030
Mapping_Rate Quality_format rRNA_rate treatment
Min. :0.9758 Length:9 Min. :0.004549 Length:9
1st Qu.:0.9782 Class :character 1st Qu.:0.005082 Class :character
Median :0.9814 Mode :character Median :0.005345 Mode :character
Mean :0.9818 Mean :0.007865
3rd Qu.:0.9871 3rd Qu.:0.005818
Max. :0.9877 Max. :0.026945
The median “rRNA_rate” is 0.005345.
- How long is the
samplegroupfactor?
# Return the length of the samplegroup factor vector
length(samplegroup)[1] 9
- What are the dimensions of the
proj_summarydataframe?
# Return the dimensions of the proj_summary dataframe
dim(proj_summary)[1] 9 8
- When you use the
rownames()function onmetadata, what is the data structure of the output?
# Return the data structure for the output of rownames() on the metadata data frame object
str(rownames(metadata)) chr [1:12] "sample1" "sample2" "sample3" "sample4" "sample5" "sample6" ...
It is a character vector.
[Optional] How many elements in (how long is) the output of colnames(proj_summary)? Don’t count, but use another function to determine this.
# Return the number of elements in colnames(proj_summary)
length(colnames(proj_summary))[1] 8