Functions in R
Approximate time: 30 min
Learning Objectives¶
- Describe and utilize functions in R.
- Modify default behavior of a function using arguments.
- Identify R-specific sources of obtaining more information about functions.
- Demonstrate how to create user-defined functions in R
Functions and their arguments¶
What are functions?¶
A key feature of R is functions. Functions are "self contained" modules of code that accomplish a specific task. Functions usually take in some sort of data structure (value, vector, dataframe etc.), process it, and return a result.
The general usage for a function is the name of the function followed by parentheses:
The input(s) are called arguments, which can include:- the physical object (any data structure) on which the function carries out a task
- specifications that alter the way the function operates (e.g. options)
Not all functions take arguments, for example:
However, most functions can take several arguments. If you don't specify a required argument when calling the function, you will either receive an error or the function will fall back on using a default.
The defaults represent standard values that the author of the function specified as being "good enough in standard cases". An example would be what symbol to use in a plot. However, if you want something specific, simply change the argument yourself with a value of your choice.
Basic functions¶
We have already used a few examples of basic functions in the previous lessons i.e getwd()
, c()
, and factor()
. These functions are available as part of R's built in capabilities, and we will explore a few more of these base functions below.
Let's revisit a function that we have used previously to combine data c()
into vectors. The arguments it takes is a collection of numbers, characters or strings (separated by a comma). The c()
function performs the task of combining the numbers or characters into a single vector. You can also use the function to add elements to an existing vector:
glengths <- c(glengths, 90) # adding at the end
glengths <- c(30, glengths) # adding at the beginning
What happens here is that we take the original vector glengths
(containing three elements), and we are adding another item to either end. We can do this over and over again to build a vector or a dataset.
Since R is used for statistical computing, many of the base functions involve mathematical operations. One example would be the function sqrt()
. The input/argument must be a number, and the output is the square root of that number. Let's try finding the square root of 81:
Now what would happen if we called the function (e.g. ran the function), on a vector of values instead of a single value?
In this case the task was performed on each individual value of the vector glengths
and the respective results were displayed.
Let's try another function, this time using one that we can change some of the options (arguments that change the behavior of the function), for example round
:
We can see that we get 3
. That's because the default is to round to the nearest whole number. What if we want a different number of significant digits? Let's first learn how to find available arguments for a function.
Seeking help on arguments for functions¶
The best way of finding out this information is to use the ?
followed by the name of the function. Doing this will open up the help manual in the bottom right panel of RStudio that will provide a description of the function, usage, arguments, details, and examples:
Alternatively, if you are familiar with the function but just need to remind yourself of the names of the arguments, you can use:
Even more useful is the example()
function. This will allow you to run the examples section from the Online Help to see exactly how it works when executing the commands. Let's try that for round()
:
In our example, we can change the number of digits returned by adding an argument. We can type digits=2
or however many we may want:
Note
If you provide the arguments in the exact same order as they are defined (in the help manual) you don't have to name them:
However, it's usually not recommended practice because it involves a lot of memorization. In addition, it makes your code difficult to read for your future self and others, especially if your code includes functions that are not commonly used. (It's however OK to not include the names of the arguments for basic functions likemean
, min
, etc...). Another advantage of naming arguments, is that the order doesn't matter. This is useful when a function has many arguments.
Exercise
-
Let's use base R function to calculate mean value of the
glengths
vector. You might need to search online to find what function can perform this task. -
Create a new vector
test <- c(1, NA, 2, 3, NA, 4)
. Use the same base R function from exercise 1 (with addition of proper argument), and calculate mean value of thetest
vector. The output should be2.5
.NOTE: In R, missing values are represented by the symbol
NA
(not available). It’s a way to make sure that users know they have missing data, and make a conscious decision on how to deal with it. There are ways to ignoreNA
during statistical calculation, or to removeNA
from the vector. If you want more information related to missing data orNA
you can go to this page (please note that there are many advanced concepts on that page that have not been covered in class). - Another commonly used base function is
sort()
. Use this function to sort theglengths
vector in descending order.
User-defined Functions¶
One of the great strengths of R is the user's ability to add functions. Sometimes there is a small task (or series of tasks) you need done and you find yourself having to repeat it multiple times. In these types of situations, it can be helpful to create your own custom function. The structure of a function is given below:
name_of_function <- function(argument1, argument2) {
statements or code that does something
return(something)
}
- First you give your function a name.
- Then you assign value to it, where the value is the function.
When defining the function you will want to provide the list of arguments required (inputs and/or options to modify behaviour of the function), and wrapped between curly brackets place the tasks that are being executed on/using those arguments. The argument(s) can be any type of object (like a scalar, a matrix, a dataframe, a vector, a logical, etc), and it’s not necessary to define what it is in any way.
Finally, you can “return” the value of the object from the function, meaning pass the value of it into the global environment. The important idea behind functions is that objects that are created within the function are local to the environment of the function – they don’t exist outside of the function.
Let's try creating a simple example function. This function will take in a numeric value as input, and return the squared value.
Once you run the code, you should see a function named square_it
in the Environment panel (located at the top right of Rstudio interface). Now, we can use this function as any other base R functions. We type out the name of the function, and inside the parentheses we provide a numeric value x
:
Pretty simple, right? In this case, we only had one line of code that was run, but in theory you could have many lines of code to get obtain the final results that you want to "return" to the user.
Do I always have to return()
something at the end of the function?
In the example above, we created a new variable called square
inside the function, and then return the value of square
. If you don't use return()
, by default R will return the value of the last line of code inside that function. That is to say, the following function will also work.
return
at the end of a function as the best practice.
We have only scratched the surface here when it comes to creating functions! We will revisit this in later lessons, but if interested you can also find more detailed information on this R-bloggers site, which is where we adapted this example from.
Exercise
- Write a function called
multiply_it
, which takes two inputs: a numeric valuex
, and a numeric valuey
. The function will return the product of these two numeric values, which isx * y
. For example,multiply_it(x=4, y=6)
will return output24
.
Attribution notice
This lesson has been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- The materials used in this lesson are adapted from work that is Copyright © Data Carpentry (http://datacarpentry.org/). All Data Carpentry instructional material is made available under the Creative Commons Attribution license (CC BY 4.0).