Exercise
Create an R script and add the appropriate slurm directives to run it as a batch script using the sbatch
command from Slurm.
RMarkdown vs. R Script
RMarkdown is a file format in its most basic form, that can eventually be converted into a shareable document, e.g HTML, PDF and many others. It allows you to document not just your R (Python and SQL) code, but also enables the inclusion of tables, figures, along with descriptive text.
However, R Scripts are plain text files that only have code for R to execute. One way to think about the difference between RMarkdown and an R Script is that an R Script likely looks like all of the code within the code blocks in your R Markdown pushed together. Oftentimes when working with RMarkdown, you will likely share the HTML/PDF/etc., but when working with an R script you would share the R script directly.
Create the R script
- Type
nano sqrt_input.R
at the command prompt. This will open up a new script file where you can add the contents of your R script.
NOTE:
nano
is a text editor that you can use when working on the command line. There are other editors you can choose from. For more information on command-line text editors, check out this lesson.
- The script will take in a number and return the square root of that number rounded to two decimal places. You can copy and paste the code below into the text editor.
#!/usr/bin/env Rscript
# Usage: this Rscript will accept a number and provide the square root of that number rounded to two decimal places.
# Rscript sqrt_input.R <number>
print("reading in arguments from command line")
args <- commandArgs(trailingOnly = TRUE)
## commandArgs reads in the arguments as a character vector
print("converting input to numeric")
num <- as.numeric(args[1])
print("running the sqrt() and round() functions on the input")
round(sqrt(num), digit=2)
-
Now to save this file press CTRL + O, followed by the return key. You have now written the changes to file. You can press CTRL + X to exit.
-
Once you have created the script, test that it runs well interactively. Note, you will first need the R module loaded in your current O2 session, if it’s not already there!
How to run the R script
The script can be run from the command line using the
Rscript
command. Don't forget to provide a numeric value as input.Rscript sqrt_input.R 60
Create a job submission script
If you need help figuring out what options are available for slurm directives, the HMSRC has a great wiki page which outlines this in more detail.
-
Using nano, create a new file called
sqrt_R.sbatch
. -
Open up the new file and add a shebang line
#!/bin/bash
. - Next, add SLURM/
sbatch
directives at the top of the script requesting the following resources:- Use partition
priority
(-p
) - Request 5 minutes (
-t
) - Request 100MB of memory (
--mem
) - Request a single core (
-c
) - Give the job a name (
--job-name
) - Specify the output file name (
-o
) - Specify the error file name (
-e
)
- Use partition
- Add the line of code to load the required modules.
- Add the line of code to run the R script.
- Save the file and exit the nano text editor.
Run the script to start a new job on O2
- Run the new script using the
sbatch
command
Check the job/run
- Use
sacct
to check the status of your job submission - Check the contents of your current directory -
- Are there any new files with names ending in
.out
and.err
? - What are the contents of these two files?
- Are there any new files with names ending in
Couldn't get the script to work? Click here for the answer key.
Your sbatch script should look something like this:
#!/bin/bash #SBATCH -p priority # partition name #SBATCH -t 0-2:00 # hours:minutes runlimit after which job will be killed #SBATCH --mem 8G # amount of memory requested #SBATCH --job-name sqrt_R_script # Job name #SBATCH -o sqrt.out # File to which standard out will be written #SBATCH -e sqrt.err # File to which standard err will be written # Load required modules module load gcc/6.2.0 R/4.1.1 # Run the R script Rscript sqrt_input.R 60
This lesson has been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.