Saving data and plots to file

Approximate time: 30 minutes

Learning Objectives

Describe how to export data tables and plots for use outside of the R environment.

Writing data to file

Everything we have done so far has only modified the data in R; the files have remained unchanged. Whenever we want to save our datasets to file, we need to use a write function in R.

To write our matrix to file in comma separated format (.csv), we can use the write.csv function. There are two required arguments: the variable name of the data structure you are exporting, and the path and filename that you are exporting to. By default the delimiter or column separator is set, and columns will be separated by a comma:

# Save a data frame to file
write.csv(sub_meta, file="data/subset_meta.csv")

Oftentimes the output is not exactly what you might want. You can modify the output using the arguments for the function. We can explore the arguments using the ?. This can help elucidate what each of the arguments can adjust the output.

?write.csv

Similar to reading in data, there are a wide variety of functions available allowing you to export data in specific formats. Another commonly used function is write.table, which allows you to specify the delimiter or separator you wish to use. This function is commonly used to create tab-delimited files.

NOTE: Sometimes when writing a data frame using row names to file with write.table(), the column names will align starting with the row names column. To avoid this, you can include the argument col.names = NA when writing to file to ensure all of the column names line up with the correct column values.

Writing a vector of values to file requires a different function than the functions available for writing dataframes. You can use write() to save a vector of values to file. For example:

# Save a vector to file
write(glengths, file="data/genome_lengths.txt")

If we wanted the vector to be output to a single column instead of five, we could explore the arguments:

?write

Note, the ncolumns argument that it defaults to five columns unless specified, so to get a single column:

# Save a vector to file as a single column
write(glengths, file="data/genome_lengths.txt", ncolumns = 1)

Exporting figures to file

There are two ways in which figures and plots can be output to a file (rather than simply displaying on screen).

The first (and easiest) is to export directly from the RStudio ‘Plots’ panel, by clicking on Export when the image is plotted. This will give you the option of png or pdf and selecting the directory to which you wish to save it to. It will also give you options to dictate the size and resolution of the output image.
The second option is to use R functions and have the write to file hard-coded in to your script. This would allow you to run the script from start to finish and automate the process (not requiring human point-and-click actions to save). In R’s terminology, output is directed to a particular output device and that dictates the output format that will be produced. If we wanted to save our scatterplot to a pdf file format we can use a function called ggsave within ggplot2 to help us.

ggplot(new_metadata) +
  geom_point(aes(x = age_in_days, y= samplemeans, color = genotype,
  			shape=celltype), size=rel(3.0))
ggsave("figures/scatterplot.pdf")

You can specify the file format you would like it to print to with the device argument. However, ggsave will try to guess your preferred format from the extension that you provide in your file name.

By default, ggsave will save the last image that you rendered. If you want to specify the plot to save you can store your image in an object and use the plot argument inside of ggsave to save that specific image.

Click here to see how to save an image in base R

A device must be created or “opened” in order to receive graphical output and, for devices that create a file on disk, the device must also be closed in order to complete the output.

If we wanted to print our scatterplot to a pdf file format, we would need to initialize a plot using a function which specifies the graphical format you intend on creating i.e.pdf(), png(), tiff() etc. Within the function you will need to specify a name for your image, and the with and height (optional). This will open up the device that you wish to write to:

## Open device for writing
pdf("figures/scatterplot.pdf")

If you wish to modify the size and resolution of the image you will need to add in the appropriate parameters as arguments to the function when you initialize. Then we plot the image to the device, using the ggplot scatterplot that we just created.

## Make a plot which will be written to the open device, in this case the temp file created by pdf()/png()
ggplot(new_metadata) +
  geom_point(aes(x = age_in_days, y= samplemeans, color = genotype,
  			shape=celltype), size=rel(3.0))

Finally, close the "device", or file, using the dev.off() function. There are also bmp, tiff, and jpeg functions, though the jpeg function has proven less stable than the others.

## Closing the device is essential to save the temporary file created by pdf()/png()
dev.off()

You will not be able to open and look at your file using standard methods (Adobe Acrobat or Preview etc.) until you execute the dev.off() function.

In the case of pdf(), if you had made additional plots before closing the device, they will all be stored in the same file with each plot usually getting its own page, unless otherwise specified.
</details> --- *This lesson has been developed by members of the teaching team at the [Harvard Chan Bioinformatics Core (HBC)](http://bioinformatics.sph.harvard.edu/). These are open access materials distributed under the terms of the [Creative Commons Attribution license](https://creativecommons.org/licenses/by/4.0/) (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.*