4.1 Functions - Advanced

In the "For Students" section, we looked at what a function is and how to use one. In this section, we're going to look more at the structure of a function and how you might go about writing your own functions.

4.1.1 Creating functions

R and its packages give you access to hundreds of thousands of different functions, all tailored to perform a particular task. Despite this wide array to choose from however, they will always be cases where there isn't a function to do exactly what you need to do. For those of you coming over from Excel, this can often be a serious source of frustration where there isn't an Excel function for you to use and there isn't an easy way to create one without knowing VBA.

R is different. Creating functions can be very simple and will really change the way you work.

Creating functions will also highlight an important delineation. Previously, we've been focusing on calling functions. Calling a function is essentially using it. But in order to call a function, it needs to be defined. Base functions are already defined (i.e. someone has already written what the function is going to do), but when you're creating your own functions, you are defining a new function that you're presumably going to call later on. Function structure

If we go back to the beginning of this chapter, we learned that everything that exists is an object. Functions are no exception, and so we create them like we do all our other objects. There is a slight diversion however. When we define a function, we assign it to an object with the function keyword like this:

my_first_function <- function() {}

Notice how we've got two sets of brackets here. The first (()) is where we define our input parameters. The second ({}) is where we define the body of our function.

Let's do a simple example. Let's create a function that adds two numbers together:

my_sum_function <- function(x, y) {
  x + y

So in this example, I've defined that when anyone uses the function, they need to provide two input parameters named x and y. Something that people tend to struggle with is that the names of your input parameters have no implicit meaning. They are just used to reference the value provided in the body of the function and, hopefully, make it clear what kind of thing the user of the function should be providing. This is why for example in some functions that require a dataframe there will be an input parameter called df or similar. But importantly, these names are technically just arbitrary.

In the body of the function, we can see that we're just doing something really simple: we're adding x and y together with +.

Once I've run the code to define my function, I can then call it like I would any other function:

my_sum_function(x = 5, y = 6)
## [1] 11 Optional input parameters

When defining your function, you can define optional parameters. These will likely be values where most of the time you need it to be one thing, but there are edge cases where you need it to be something else. Defining optional parameters is really easy; whenever you define your function, just give it a value and that will be its default:

add_mostly_2 <- function(x, y = 2){
  x + y

add_mostly_2(x = 5)
## [1] 7
add_mostly_2(x = 5, y = 3)
## [1] 8 ...

You'll notice a crucial distinction between R's sum() function and ours. The base function allows for an indeterminate number of input parameters, whereas we've only allowed 2 (x and y). This is because the base sum() function uses a .... This ... is essentially shorthand for "as many or as few inputs as the user wants to provide". To use the ..., just add it as in an input parameter:

dot_dot_dot_function <- function(x, y, ...) {

The ... works particularly well when you might be creating a function that wraps around another one. A wrapping function is just a function that makes a call to another one within it, like this:

sum_and_add_2 <- function(...){
  sum(...) + 2

All we're basically doing in the above wrapping around the sum() function to add some specific functionality.

By using the ... here, we can just pass everything that the user provides to the sum() function. This means we don't have to worry about copying any input parameters. Return values

As I mentioned in the "For Students" section, functions have a single return value. By default, a function will return the last evaluated object in the function environment. In our my_sum_function example, our last evaluation was x + y, so the output of that was what was returned by the function.

You can also be explicit with your return values by using the return() function. The return() function will return whatever is provided to the return() function. This can be useful if you want to return a value prematurely:

early_return_function <- function(x,y, return_x  = TRUE) {
  if (return_x) {
  x + y

Here, we can see more clearly that x is returned when return_x is TRUE and x + y is returned otherwise.

Certain style guides suggest that you should only use return() statements for early returns. In other words, the "normal" return value for your function should be defined by what's evaluated last. Personally, I think you should use whatever makes it clearer for you. I quite like seeing explicit return() values in a function because I find it makes it clearer what all the possible return values are, but this is just personal preference.

If you're interested, there are some specifics about function inputs in R that can be good to know.

4.1.2 Input validation

Firstly, unlike some other languages, functions do not have a specific data type tied to each input parameter. Any requirements that should be imposed on an input parameter (e.g. it should be numeric) are done by the function creator in the body of the function. So for instance, when you try to sum character strings, the error you get occurs because of type-checking in the body of the function, not when you provide the input parameters.


4.1.3 Functions as objects

Secondly, functions are technically just another object. This means that you can use functions like you would any other object. For instance, some functions will accept other functions as an input parameter. When we move onto the apply logic, the lapply() (list-apply) function requires a FUN parameter that is the function the be applied to each value in the provided list.

sum_list <- list(

lapply(sum_list, FUN = sum)
## [[1]]
## [1] 3
## [[2]]
## [1] 15
## [[3]]
## [1] 50

Linked with the idea that functions are just another type of an object, there is an important distinction between sum and sum(). The first will return the sum object. That is, not the result of applying inputs to the sum function, but the function itself. If you just type the name of the function into the console, it will show you the code for that function (it's definition):

## function (..., na.rm = FALSE)  .Primitive("sum")

Conversely, sum() will call the sum function with the inputs provided in the brackets.