9 Functions

Being a functional programming language, functions are at the heart of R. We’ve already used lots of functions in the previous chapters, but now we’re going to look in more detail about what a function is.

John Chambers, creator of the S programming language upon which R is based and core member of the R programming language project, said this:

"To understand computations in R, two slogans are helpful:

  • Everything that exists is an object.
  • Everything that happens is a function call.

— John Chambers"

For now, we’re going to focus on that second statement. What does it mean?

Well, a function is quite simple. It has an input, it does something, and then it gives an output. A really simple example of this is just typing print(1) into the console and hitting enter. You’ve given an input, there was a calculation, and now there’s an output. Something’s happened (1 was printed in the console) and it was done by calling a function (print).

If you’re well-versed in mathematics, you’ll know that functions in maths are the same. \(f(x) = {3x}\) means that the to get y, you take x and multiply it by three. In this case, our input is x, our bit in the middle is multiplying by three, and then our output is y.

If you haven’t used functions in mathematics then don’t worry. Even by getting this far in the book, you’ve already used functions loads of times. For example, how do you create a vector? If you remember, you use the c() function, which we know stands for “concatenate”. So, every time you’ve created a vector, you’ve used a function without even knowing it. The input was whatever you provided in the brackets. The computation was to concatenate everything together. And then the output was the vector.

Similarly, whenever you created a factor or a matrix or a dataframe or whatever, you used a function. You provided an input, there was a computation to change that input, and then you got an output.

As confusing as functions will inevitably become, just try to remember the core of what a function is: When you call a function, there’s an input, something happens, and there’s an output.

9.1 Functions in R

So more specifically, what do functions look like in R? Well, a good starting point is that when we call (use) a function, we provide the name and it’s almost always followed by brackets (()). This helps make it clear what values you’re providing as your inputs to the function.

I say that nearly almost all functions are followed by (), because some aren’t. A simple example of this is +. + is still a function:

## [1] TRUE
# the backticks just mean I'm referring
# to the + function without using it

But it doesn’t have brackets. Instead, we can use a shorthand where we provide the values we want to give to the function either side of it (e.g. 1 + 2). Importantly however, the logic is exactly the same, and you can still use the + like a normal function with brackets:

## [1] 3

It’s just that this looks a little weird to us, so we often use the shorthand way. But the long and short of it is: an easy way to tell when someone is calling (using) a function is to look for the () after the function name.

9.2 Inputs

We know that to use a function in R, we have to provide inputs*. And we also know that we provide our inputs within the brackets after the function name. But how do we know what values are allowed?

*Technically, sometimes you don’t have to provide an input to a function (e.g. Sys.Date(), which gives us the current date without putting anything in the brackets). But most functions require at least one input.

By typing a ? followed by the name of the function into the console (e.g. ?length()), you’ll get a help page showing you the input parameters allowed by the function. So if we use ?length() as an example, the help page tells us that the length() function expects one input parameter, x, and that needs to be an R object. Nice and simple. ...

In some cases, you’ll see a ... as one of the input parameters. This essentially means that you can provide an indeterminate number of values for that input. I know that sounds confusing, but the c() function is a good way of demonstrating this. When you create a vector, you can provide an (essentially) infinite number of values to the function. So the c() function basically bundles everything you provide to it into that ... parameter.

9.2.1 Explicit input parameters

If you type ?c() into the console however, you’ll see that there are also some other input parameters: recursive and use.names. Well Adam, if ... just bundles everything I provide into a single input, then how do those work? Well this outlines the importance of providing explicit input parameters. When we’re explicit, we’re saying exactly which input parameter we’re referring to with each value we provide. And to do this, we just provide the name of the input parameter when we give it. Let’s look at the substr() function as an example.

The substr() function simply returns part of a character string that you provide. So, if I was to type:

substr("Hello", 1, 3)
## [1] "Hel"

I get the first to the third characters in the string “hello”. With this function call however, I haven’t been explicit. Instead, I’ve just provided the inputs in the order that they’re listed in the documentation:

  • x
    • a character vector
  • start
    • the first element to be extracted
  • stop
    • the last element to be extracted

To be explicit, I need to provide the name of the input parameter that I’m referring to when I provide my inputs:

substr(x = "Hello", start = 1, stop = 3)
## [1] "Hel"
substr(start = 1, stop = 3, x = "Hello")
## [1] "Hel"

Notice how when I’m being explicit it doesn’t matter what order I provide my inputs in, R knows which value should be mapped to which input parameter.

Also, notice how we’re using = here and not anything else like <-? This is another reason why I suggest not using = for assignment: we use = and when we’re providing input parameters and so it’s good to keep them separate.

So how does this link back with the ...? Well, with the c() function, every unnamed parameter you provide is bundled into the ... parameter. To give values for the recursive and use.names parameters, you’d need to provide them explicitly (e.g. recursive = TRUE). This will be true of many functions where you see a .... If you’re not explicit with the parameters that you don’t want to be included in the ..., you’re going to have a bad time. Optional input parameters

For many functions, certain parameters have a predefined value that they will default to. This provides a level of flexibility whilst not requiring lines and lines of code for every function call; there’s a default value, but you can override it if needed.

Optional parameters are easily distinguished in the documentation of a function because they will a value already assigned to them like this: use.names = TRUE.

For instance, when we create a vector using the c() function, there are two optional parameters (recursive and use.names) that already have the values TRUE and FALSE assigned to them. To override these defaults, we just need to provide a new value to the parameter like this:

c('one'=1,'ten'=10,'fifteen'=15, use.names = FALSE)
## [1]  1 10 15

Values without default values are often called required parameters. This is a bit of a misnomer because there will be instances where you won’t provide a value to these parameters and the function will work just fine. This is down to two things that we’ll look at in the For Teachers section: lazy evaluation and the missing() function. Most developers try and stay away from having required parameters that aren’t actually required, but you’ll still see them around (e.g. you can use the download.file() function without providing a value to the method parameter even though it doesn’t have a default value.

9.3 Outputs

First and foremost, in R you can have as many input as you like to a function. However, a function will only ever return one thing. I say one thing, because functions can return a list which itself can contain multiple values, but just keep this in mind: Functions in R have a single return value.

9.3.1 Reassigning outputs

Functions in R do not edit the inputs you provide in place. Instead, they essentially work on copies of the inputs you provide. We call this immutability. Here’s a quick example:

x <- 1
sum(x, 1)
## [1] 2
## [1] 1

As you can see, when we call the sum() function with x as an input parameter, the value of x stays the same.

If you do want to edit your original value, you just need to reassign the output of the function call back to the variable. I know that sounds complicated, but it’s quite simple:

x <- 1
x <- sum(x,1)
## [1] 2

This works because the right-hand side of the assignment line is executed first. In other words, when the sum(x,1) is evaluated, x is still equal to one. This makes sense because otherwise it’d be very hard to keep track of what x was equal to!

This behaviour (not changing the input parameter value in place) is a major point of difference between functions and what are called methods in other languages. If you’re coming from something like Python, you may be used to changing objects through methods: object.AddNew or something like that. In R, functions do not change variables in the global environment because they are executed in their own environment. To learn more about environments, there is an environments chapter in the “For Teachers” section.

9.4 Questions

  1. Why does `<-`(test, 2) work? What does this tell us about <-?
  2. Why does mean(1,2) not return the output you’d expect but sum(1,2) does?
    • Hint: the documentation of both functions will help.
  3. Other than Sys.Date(), can you think of another example of a function that be executed without any explicit input parameters?