Reading in and inspecting data Answer Key

Author

Will Gammerdinger

Published

July 1, 2025

Exercise 1

Inside your project’s data folder you should see a file called project-summary.txt. Read it in to R using read.table() with the appropriate arguments and store it as the variable proj_summary. To figure out the appropriate arguments to use with read.table(), keep the following in mind:
- all the columns in the input text file have column name/headers
- you want the first column of the text file to be used as row names (hint: look up the input for the row.names = argument in read.table())

# Read in project-summary.txt from the data directory and let R know there is a header and that you want the first column to be converted to the rownames then assign it to an object called proj_summary
proj_summary <- read.table("data/project-summary.txt", header = TRUE, row.names = 1)

Display the contents of proj_summary in your console

# Display the contents of proj_summary
proj_summary

        percent_GC Exonic_Rate Intronic_Rate Intergenic_Rate Mapping_Rate
sample1         49      0.8913        0.0709          0.0378    0.9787998
sample2         49      0.9055        0.0625          0.0321    0.9825069
sample3         50      0.8834        0.0663          0.0503    0.9877286
sample4         50      0.9027        0.0649          0.0325    0.9870764
sample5         49      0.8923        0.0714          0.0362    0.9781835
sample6         49      0.8999        0.0667          0.0334    0.9772096
sample7         49      0.8983        0.0665          0.0352    0.9757997
sample8         49      0.9022        0.0656          0.0322    0.9877458
sample9         49      0.9111        0.0566          0.0323    0.9814494
        Quality_format   rRNA_rate treatment
sample1       standard 0.007264734      high
sample2       standard 0.005518317       low
sample3       standard 0.026944958   control
sample4       standard 0.005081974   control
sample5       standard 0.005023175      high
sample6       standard 0.005345113       low
sample7       standard 0.005240401      high
sample8       standard 0.004549047   control
sample9       standard 0.005817519       low

Exercise 2

Use the class() function on glengths and metadata, how does the output differ between the two?

# Return the class of glengths
class(glengths)

[1] "numeric"

# Return the class of metadata
class(metadata)

[1] "data.frame"

glengths is a numeric vector and metadata is a data frame.

Use the summary() function on the proj_summary dataframe, what is the median “rRNA_rate”?

# Provide a summary of the proj_summary object
summary(proj_summary)

   percent_GC     Exonic_Rate     Intronic_Rate     Intergenic_Rate  
 Min.   :49.00   Min.   :0.8834   Min.   :0.05660   Min.   :0.03210  
 1st Qu.:49.00   1st Qu.:0.8923   1st Qu.:0.06490   1st Qu.:0.03230  
 Median :49.00   Median :0.8999   Median :0.06630   Median :0.03340  
 Mean   :49.22   Mean   :0.8985   Mean   :0.06571   Mean   :0.03578  
 3rd Qu.:49.00   3rd Qu.:0.9027   3rd Qu.:0.06670   3rd Qu.:0.03620  
 Max.   :50.00   Max.   :0.9111   Max.   :0.07140   Max.   :0.05030  
  Mapping_Rate    Quality_format       rRNA_rate         treatment        
 Min.   :0.9758   Length:9           Min.   :0.004549   Length:9          
 1st Qu.:0.9782   Class :character   1st Qu.:0.005082   Class :character  
 Median :0.9814   Mode  :character   Median :0.005345   Mode  :character  
 Mean   :0.9818                      Mean   :0.007865                     
 3rd Qu.:0.9871                      3rd Qu.:0.005818                     
 Max.   :0.9877                      Max.   :0.026945

The median “rRNA_rate” is 0.005345.

How long is the samplegroup factor?

# Return the length of the samplegroup factor vector
length(samplegroup)

[1] 9

What are the dimensions of the proj_summary dataframe?

# Return the dimensions of the proj_summary dataframe
dim(proj_summary)

[1] 9 8

When you use the rownames() function on metadata, what is the data structure of the output?

# Return the data structure for the output of rownames() on the metadata data frame object
str(rownames(metadata))

 chr [1:12] "sample1" "sample2" "sample3" "sample4" "sample5" "sample6" ...

It is a character vector.

[Optional] How many elements in (how long is) the output of colnames(proj_summary)? Don’t count, but use another function to determine this.

# Return the number of elements in colnames(proj_summary)
length(colnames(proj_summary))

[1] 8