4.1 Functions

If you’re interested, there are some specifics about function inputs in R that can be good to know.

4.1.1 Input validation

Firstly, unlike some other languages, functions do not have a specific data type tied to each input parameter. Any requirements that should be imposed on an input parameter (e.g. it should be numeric) are done by the function creator in the body of the function. So for instance, when you try to sum character strings, the error you get occurs because of type-checking in the body of the function, not when you provide the input parameters.

EXAMPLES

4.1.2 Functions as objects

Secondly, functions are technically just another object. This means that you can use functions like you would any other object. For instance, some functions will accept other functions as an input parameter. When we move onto the apply logic, the lapply() function requires a FUN parameter that is the function the be applied each time.

EXAMPLE

Linked with the idea that functions are just another type of an object, there is an important distinction between sum and sum(). The first will return the sum object. That is, not the result of applying inputs to the sum function, but the function itself. If you just type the name of the function into the console, it will show you the code for that function (it’s definition:

## function (..., na.rm = FALSE)  .Primitive("sum")

Conversely, sum() will attempt to apply the sum function to the inputs provided in the brackets.

EXAMPLE

4.1.3 Creating functions

R and its packages give you access to hundreds of thousands of different functions, all tailored to perform a particular task. Despite this wide array to choose from however, they will always be cases where there isn’t a function to do exactly what you need to do. For those of you coming over from Excel, this can often be a serious source of frustration where there isn’t an Excel function for you to use and there isn’t an easy way to create one without knowing VBA.

R is different. Creating functions can be very simple and will really change the way you work.

Creating functions will also highlight an important delineation. Previously, we’ve been focusing on calling functions. Calling a function is essentially using it. But in order to call a function, it needs to be defined. Base functions are already defined (i.e. someone has already written what the function is going to do), but when you’re creating your own functions, you are defining a new function that you’re presumably going to call later on.

4.1.3.1 Function structure

If we go back to the beginning of this chapter, we learnt that everything that exists is an object. Functions are no exception, and so we create them like we do all our other objects. There is a slight diversion however. When we define a function, we assign it to an object with the function keyword like this:

Notice how we’ve got two sets of brackets here. The first (()) is where we define our input parameters. The second ({}) is where we define the body of our function.

Let’s do a simple example. Let’s create a function that adds two numbers together:

So in this example, I’ve defined that when anyone uses the function, they need to provide two input parameters named x and y. Something that people tend to struggle with is that the names of your input parameters have no implicit meaning. They are just used to reference the value provided in the body of the function and, hopefully, make it clear what kind of thing the user of the function should be providing. This is why for example in some functions that require a dataframe there will be an input parameter called df or similar. But importantly, these names are technically just arbitrary.

In the body of the function, we can see that we’re just doing something really simple: we’re adding x and y together with +.

Once I’ve run the code to define my function, I can then call it like I would any other function:

## [1] 11
4.1.3.1.1 Optional input parameters

When defining your function, you can define optional parameters. These will likely be values where most of the time you need it to be one thing, but there are edge cases where you need it to be something else. Defining optional parameters is really easy; whenever you define your function, just give it a value and that will be its default:

## [1] 7
## [1] 8

4.1.3.2

You’ll notice a crucial distinction between R’s sum() function and ours. The base function allows for an indeterminate number of input parameters, whereas we’ve only allowed 2 (x and y). This is because the base sum() function uses a .... This ... is essentially shorthand for “as many or as few inputs as the user wants to provide”. To use the ..., just add it as in an input parameter:

The ... works particularly well when you might be creating a function that wraps around another one. A wrapping function is just a function that makes a call to another one within it, like this:

All we’re basically doing in the above wrapping around the sum() function to add some specific functionality.

By using the ... here, we can just pass everything that the user provides to the sum() function. This means we don’t have to worry about copying any input parameters.

4.1.4 Function environments

To better understand how functions operate, we need to understand how environments and scoping works in R. There’s a separate chapter on environments in this book, but we’ll briefly look at how functions create environments.

Environments are hierarchical collections of objects. You can think of these environments as going from non-specific to specific. When you define a variable in a script, you are creating that variable in the global environment, a very non-specific environment. Functions, however, create their own more specific environment when they are called, but will inherit the values from the more general environments.

This breeds some specific behaviours. For example, say you’ve written a function that expects two input parameters, x and y. Well, what would happen if someone had already defined an x and y variable in their script? Which value should R use?

Let’s see what happens.

## [1] 3

In this case, the fact that there is already an x and y in the global environment doesn’t really make much difference. The function creates its own more specific environment when it’s called, and it looks for the x and y variables in here first. It finds them and uses those values (1 and 2).

But what happens if a variable doesn’t exist in the more specific function environment? Let’s take a look.

## [1] 8

In this case, the function looks in the specific environment for w, but it doesn’t exist. The only objects that exist in the function environment are the x and y that we’ve provided. So when R doesn’t find it in the more specific environment, it looks in the less-specific global environment. It finds, and so it uses the value it finds.

This can be a dangerous thing, so always make sure that you’re function is accessing the values you think it is.

So does R work the other way? Does it ever look in a more specific environment? Nope.

## Error in eval(expr, envir, enclos): object 'im_a_sneaky_variable' not found

Once the function is called, objects in its environment are unaccessible. The long and short of it is, R will start fromk specific environments and then look upwards, never downwards.

4.1.4.1 ‘Super’ assignment

There will be occassions however when you need to make changes to the global environment. For instance, say you want to increment a counter every time a function is called, regardless of where it’s called from. In these cases, you can use the controversial <<- operator. This is used as an assignment operator to assign a value to the global environment. Observe…

## [1] 3
## [1] 1
## [1] 5
## [1] 2

Note how when we assign 0 to our count variable outside of the function, we don’t need to use <<-. This is because we’re already assigning to the global environment.

Use the <<- with care and only assign something to the global environment if you really need to. Otherwise, you may start overwriting variables in your global environment without ever realising it.