4.2 Environments

TO DO

4.2.1 Environment basics

At its core, an environment is a collection of objects. A bit like a list, environments store multiple objects in a single structure.

To create a new environment, we use the new.env() function.

new_env <- new.env()

To add items to our environment, we can add them like we would a list using the $ operator:

new_env$first_object <- "hello"

To list all of the objects in an environment, we use the ls() function:

ls(new_env)
## [1] "first_object"

Importantly, you can't have two objects in the same environment with the same name. If you try, you'll just overwrite the previous value:

new_env$first_object <- "world"
new_env$first_object
## [1] "world"

4.2.2 Environment inheritance

Environments have parents and children. In other words, there is a hierarchy of environments, with environments being encapsulated in other environments while also encapsulated other environments.

Every environment (with the exception of what we call the empty environment) has a parent. For example, when I created my new_env environment before, this was created in the global environment. The global environment is the environment that objects are assigned to when working in R interactively. The global environment's parent environment will be the environment of the last package you loaded. Packages have environments to avoid name conflicts with functions and to help R know where to look for a function.

At the top of the environments of packages you've loaded will be the base environment which is the environment of base R. Finally, the base environment's parent is the empty environment which does not have a parent.

The hierarchy of these environments looks like this:

4.2.3 Scope

So we know that objects in the same environment can't have the same name, but what happens when two different environments happen to have objects with the same name? This is where the concept of scope comes in. Scoping is the set of rules that governs where R will look for a value.

Well, R will search for the object in order of environment, starting at the most specific environment (so the global environment in the above diagram) and moving up. For example, we know that there is a function in base R called sum(). But, if I define a new function in the global environment called sum then which function will be called when I type sum(...). Well, because we know that the search path starts from the most specific environment, R will look for sum in the global environment first and it'll find the sum that I've just defined. At this point, it'll stop looking because sum has been found.

For this reason, it's a good idea to use a package like conflicted to manage the packages you loaded, otherwise which function you use when you have two functions with the same name from different packages will be defined by which one you loaded later.

4.2.4 Function environments

Functions, when they called, create their own more specific environment. The parent of this environment will be the environment in which it was called (most often this will be the global environment).

This breeds some specific behaviours. For example, say you've written a function that expects two input parameters, x and y. Well, what would happen if someone had already defined an x and y variable in their script? Which value should R use?

Let's see what happens.

sum_custom <- function(x,y) {
  x + y
}

x <- 10
y <- 5

sum_custom(x = 1, y = 2)
## [1] 3

In this case, the fact that there is already an x and y in the global environment doesn't really make much difference. The function creates its own more specific environment when it's called, and it looks for the x and y variables in here first. It finds them and uses those values (1 and 2).

But what happens if a variable doesn't exist in the more specific function environment? Let's take a look.

sum_custom <- function(x,y) {
  x + y + w
}

w <- 5

sum_custom(x = 1, y = 2)
## [1] 8

In this case, the function looks in the specific environment for w, but it doesn't exist. The only objects that exist in the function environment are the x and y that we've provided. So when R doesn't find it in the more specific environment, it looks in the less-specific global environment. It finds, and so it uses the value it finds.

This can be a dangerous thing, so always make sure that you're function is accessing the values you think it is.

So does R work the other way? Does it ever look in a more specific environment? Nope.

sum_custom <- function(x,y){
  im_a_sneaky_variable <- 10
  x + y
}

im_a_sneaky_variable
## Error in eval(expr, envir, enclos): object 'im_a_sneaky_variable' not found

Once the function is called, objects in its environment are inaccessible. The long and short of it is, R will start from specific environments and then look upwards, never downwards.

4.2.4.1 'Super' assignment

There will be occasions however when you need to make changes to the global environment. For instance, say you want to increment a counter every time a function is called, regardless of where it's called from. In these cases, you can use the controversial <<- operator. This is used as an assignment operator to assign a value to the global environment. Observe...

sum_custom <- function(x,y) {
  count <<- count + 1
  x + y
}

count <- 0

sum_custom(1,2)
## [1] 3
count
## [1] 1
sum_custom(2,3)
## [1] 5
count
## [1] 2

Note how when we assign 0 to our count variable outside of the function, we don't need to use <<-. This is because we're already assigning to the global environment.

Use the <<- with care and only assign something to the global environment if you really need to. Otherwise, you may start overwriting variables in your global environment without ever realising it.