Learning Objectives

Reproducible reports in R

So far, any code that we have written in R has been in the form of an R script (.R). Any figures that we generated, were either plotted to the RStudio device and/or exported to file. What if we wanted to communicate this with our collaborators? Wouldn’t it be nice to be able to share the code along with tables, figures, and text describing the interpretation? Thankfully, in RStudio there is a way to compile all of that information into a report by using the knitr package and a simple text-markup language called RMarkdown. The combination of these two things allow users to combine code and stylized text to output information in various formats including HTML, PDF, MS_Word, ODT, RTF, Markdown, and Github flavored Markdown documents.

knitr

Markdown

Before we get started with RMarkdown, we have a short digression on Markdown, a text-to-HTML conversion tool for web writers. Simply put, Markdown is a way to style text on the web. It is mostly just regular text with a few non-alphabetic characters thrown in, like # or * to help with stylistic details.

NOTE: You can use Markdown most places around GitHub. This lesson is all written in Markdown!

Some commonly used formatting options are listed below:

# This is a Heading1 tag
## This is a Heading2 tag

This is a Heading1 tag

This is a Heading2 tag

*This text will be italicized.*
_This will also be italicized._

**This text will be bold.**
__This will also be bold.__

_You **can** combine them!_

This text will be italicized. This will also be italicized.

This text will be bold. This will also be bold.

You can combine them!

* Item 1
* Item 2
  * Item 2a
  * Item 2b
1. Item 1
1. Item 2
   1. Item 2a
   1. Item 2b
  1. Item 1
  2. Item 2
    1. Item 2a
    2. Item 2b

This is really just scratching the surface of what you can do in Markdown. There are also ways in which you can include images, links, block quotes and code (inline and code chunks). In the interest of time, we won’t go into detail here but we will point you to some very useful resources.

Resources for Markdown

NOTE: If you are working with Markdown most text editors will automatically syntax highlight but there are also various Markdown specific editors which allow you to see see the rendered version of your text as you type (i.e. MacDown for Macs, MarkdownPad for Windows).

RMarkdown

Markdown has proved so useful that many different coding groups adopted it, but also adding there own ‘flavours’. RStudio implements something called “R-flavoured markdown” (or RMarkdown) which has various features that we describe below. Rather than creating a .R script, you would create an .Rmd file which would contain code and stylized plain text using some of the options described in the Markdown section.

rmd

Introduction to knitr

knitr, developed by Yihui Xie, is an R package designed for report generation within RStudio. knitr It takes an RMarkdown file (.Rmd) and enables dynamic generation of multiple file formats from an RMarkdown file, including HTML and PDF documents. As RMarkdown grows as an acceptable reproducible manuscript format, using knitr to generate a report summary is becoming common practice. Knit report generation is now integrated into RStudio, and can be accessed using the GUI or console.

Code chunks

The basic idea of knitr (along with RMarkdown) is that you can write your analysis workflow in plain text and intersperse chunks of code delimited with a special marker (```). Backticks (`) commonly indicate code and are also used on GitHub. Each chunk should be given a unique name. knitr isn’t very picky how you name the code chunks, but we recommend using snake_case for the names whenever possible.

Additionally, you can write inline R code enclosed by single backticks (`) containing a lowercase r (like ``` code chunks). This allows for variable returns outside of code chunks, and is extremely useful for making report text more dynamic. For example, you can print the current date inline with this syntax: ` r Sys.Date() ` (no spaces).

Per chunk options

knitr provides a lot of customization options for code chunks, which are written in the form of tag=value.

There is a comprehensive list of all the options available, however when starting out this can be overwhelming. Here, we provide a short list of some options commonly use in chunks:

There are also a few options commonly used for plots to easily resize images:

Global options

knitr allows for global options to be set on all chunks in an RMarkdown file. These are options that should be placed inside your setup chunk at the top of your RMarkdown document.

opts_chunk$set(
    autodep = TRUE,
    cache = TRUE,
    cache.lazy = TRUE,
    dev = c("png", "pdf", "svg"),
    error = TRUE,
    fig.height = 6,
    fig.retina = 2,
    fig.width = 6,
    highlight = TRUE,
    message = FALSE,
    prompt = TRUE,
    tidy = TRUE,
    warning = FALSE)

The setup chunk

The setup chunk is a special knitr chunk that should be placed at the start of the document. We recommend storing all the user-defined parameters in the setup chunk that are required for successful knitting. Also you could include all library() loads required for the script and other load() requests for external files here.

{r setup, include=FALSE}

#=================
# Load packages (load all the packages here at the beginning)
#=================
library(xtable) ## for making awesome tables
library(ggplot2) ## for plotting

# Set some basic options. You usually do not want your code, messages, 
# warnings etc to show in your actual manuscript however for the first
# run or two these will be set on.
  
knitr::opts_chunk$set(warning=TRUE,
            message=TRUE,
            echo=TRUE,
            cache = FALSE,
            tidy = FALSE, ## remove the auto-formatting
            error=TRUE)   

NOTE: An additional cool trick is that you can save opts_chunk$set settings in ~/.Rprofile and these knitr options will apply to all of your RMarkdown documents.

Figures

A neat feature of knitr is how much simpler it makes generating figures. You can simply return a plot in a chunk, and knitr will automatically write the files to disk, in an organized subfolder. By specifying options in the setup chunk, you can have R automatically save your plots in multiple file formats at once, including PNG, PDF, and SVG. A single chunk can support multiple plots, and they will be arranged in squares below the chunk in RStudio.

Tables

knitr includes a simple but powerful function for generating stylish tables in a knit report named kable(). Here’s an example using R’s built-in mtcars dataset:

help("kable", "knitr")
mtcars %>%
    head %>%
    kable
  mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Generating the report

help("knit", "knitr")

Once we’ve finished creating an RMarkdown file containing code chunks, we finally need to knit the report. When executing knit() on a document, by default this will generate an HTML report. If you would prefer a different document format, this can be specified in the YAML header with the output: parameter:

RStudio now supports a number of formats, each with their own customization options. Consult their website for more details.

render() (advanced)

help("render", "rmarkdown")

The knit() command works great if you only need to generate a single document format. RMarkdown also supports a more advanced function named rmarkdown::render(), allows for output of multiple document formats. To accomplish this, we recommend saving a special file named _output.yaml in your project root.

rmarkdown::html_document:
    code_folding: hide
    df_print: kable
    highlight: pygments
    number_sections: false
    toc: true
rmarkdown::pdf_document:
    number_sections: false
    toc: true
    toc_depth: 1

NOTE: PDF rendering is sometimes problematic, especially when running R remotely, like on the O2 cluster. If you run into problems, it’s likely an issue related to pandoc.

Working directory behavior

knitr redefines the working directory of an RMarkdown file in a manner that can be confusing. If you’re working in RStudio with an RMarkdown file that is not at the same location as the current R working directory (getwd()), you can run into problems with broken file paths. Suppose you have RStudio open without a project loaded, the working directory is usually set to your home directory. Now, if you load an RMarkdown file from the desktop at ~/Users/myserame/Desktop, knitr will set the working directory within chunks to be relative to the desktop. We advise against coding paths in a script to only work with knitr and not base R.

A simple way to resolve this issue is by creating an R project for the analysis, and saving all RMarkdown files at the top level, to avoid running into unexpected problems related to this behavior.

Convert an R script to an RMarkdown knit report

Now that we know some of the basics of RMarkdown, let’s convert our Mov10 DE analysis script into an RMarkdown report!

  1. Download the .Rmd file
  2. Open up your DEanalysis R project in Rstudio.
  3. Move your RMarkdown file into the project working directory.
  4. Open up the .Rmd file and Knit the report.

Download to data folder https://tinyurl.com/download-data-DE

Download to meta folder https://tinyurl.com/download-meta-DE

Once the report has been knit, it should open up in a separate window. If not, you will now see an html file in your workindg directory (de_script_toknit.html) which you can open in a web browser. This report contains some of the commands we ran in Session III. This report is a great template but it can use a few tweaks to make it a bit more aesthetically pleasing.

Additional resources