4.3 Objects and Classes

Objects and classes are a fairly ubiquitous concept across programming languages and data analysis tools. We'll briefly look at what an object is in R, but by no means is this an exhaustive description.

4.3.1 What is a class?

The world is complicated. Everything in the world is unique and defined by an infinite number of properties and features. For example, if you take a person, they can be defined by their name, where they're from, where their parents are from, their hair colour, their likes and dislikes, and so on and so on. When we want to store data in a structured and formal way in a computer system, however, this isn't particularly helpful. Instead, we need to store data in a predefined structure. This is essentially what an class is; a pre-defined structure to store important attributes or features of a thing.

Let's take the person instance again. Let's say we're going to store data on a number of individuals, we won't be able to store everything about them. So we'll choose a subset of their attributes or features to store that are relevant to what we need. But to make things more efficient, we're going to store the same information for each one. So lets say we're going to do some geographical analysis, we might want to include a person's name, their nationality, and perhaps their ethnicity. So for each person we want to store, we can store these three attributes. And we might call this data structure a "person". Well this is exactly what an class is; a collection of attributes and features that is shared objects instances of the same type.

So our class is "person", and it has the attributes "name", "nationality", and "ethnicity". Now this obviously doesn't capture everything about a person, but it's enough for what we want to do.

Graphing that object might look something like this:

4.3.2 What is an object?

An object is just an instance of a class. So if you create a person from our "person" called John, then "John" is the object an "person" is the class.

When you create a dataframe for example, you're creating a data.frame object from the basic structure of the data.frame class.

4.3.3 Objects and Classes in R

Looking more specifically at R, what kind of objects do we see. Well, according to John Chambers, the founder of the S programming language upon which R is based, everything is:

"To understand computations in R, two slogans are helpful:

  • Everything that exists is an object.
  • Everything that happens is a function call.

--- John Chambers"

So every function, dataframe, plot, list, vector, integer, everything is an object.

To see the class of an object in R, use the class() function.

class(data.frame()) # here we're finding the class of an empty data.frame
## [1] "data.frame"
class(data.frame) # here we're finding the class of the data.frame() function
## [1] "function"
class(1)
## [1] "numeric"

4.3.4 Inheritance

Sometimes, different classes are interrelated. For example, if you were storing data on vehicle, then you might have a "vehicle" class. But if you're storing data on lots of different vehicles, you might also have classes for each type of vehicle (e.g. "car", "motorbike", etc.). All of these classes are still vehicles, and so you don't want to have to repeat yourself when you define each of those classes. In other words, all of those classes are going to have some common attributes like colour, horsepower and so on. Similarly, each different type of vehicle will have some attributes that are unique to that type of vehicle. For instance, motorbikes can have sidecars but vans and cars don't. Cars and vans have doors but motorbikes don't. This highlights the benefits of inheritance. By creating a "vehicle" class and allowing your subsequent classes to inherit all of the attributes of the "vehicle" class, you can avoid duplication while allowing distinct classes. This way, when you want to add any attributes to all of your vehicles, you can just do it via the "vehicle" class rather than changing each type of vehicle class.

Diagramming this relationship:

Inheritance is an extremely deep topic which we won't go into here, but R objects also use inheritance. To see the inheritance tree of an object, use the is() function:

is(1L)
## [1] "integer"             "double"              "numeric"            
## [4] "vector"              "data.frameRowLabels"

Here we can see that an integer object is made up of the integer, double, numeric classes. The order of inheritance goes from left to right so we know that the integer class inherits from the double class, which inherits from the numeric class.

4.3.4.1 Object-Oriented Systems

Unfortunately (or fortunately, depending on your point of view), R doesn't have a single way of storing objects. In fact, there are 2 object-oriented (OO) programming systems in base R, and more (like R6) can be added via packages. These two base OO systems are S3 and S4, and they differ in the way that objects are constructed, stored, and interacted with. We're not going to go into the difference here, but I recommend Hadley's Advanced R to better understand the difference between the two. For now, I'm just going to explain the basics of the S3 system as I think it's the easiest to understand and helps convey the philosophy behind why we use objects slightly more easily.

4.3.4.1.1 S3

In the S3 system, we rely on generic functions and methods. Generics are functions that have a single common goal, but that can be used for objects that might be very different. For example, print()-ing a dataframe is going to be different to print()-ing a plot or an API response or whatever. But print() always seems to know what to do. The reason it does is that the print() function is a generic function that actually uses a more specific function to achieve it's goal. In other words, we achieve a fairly high level goal like printing by calling a function that is specific to the object we're working on under the hood. These more specific functions are called methods.

As a real world analogy, think of the process of talking to someone. The common goal in talking is to communicate. But, depending on the language that someone speaks, the actual act of talking is going to be slightly different for different people you talk to. In this case, you can think of communicating as being the generic - it's the eventual goal. And talking in the appropriate language as being the method.

Going back to R, if you type print. into the console, the autocompleter will show you lots and lots of print.something() functions. These are all of the methods for all of the different printable objects in R. print.date() will print a date object, print.data.frame() will print a dataframe object and so on. But when you just use the print() function on an object, R will automatically choose which method it needs for the object you've passed as an input parameter.

# here we're using the generic
print(as.Date("2020/06/10")) 
## [1] "2020-06-10"
# because we're printing a Date object,
# this is the method that is actually used
print.Date(as.Date("2020/06/10"))
## [1] "2020-06-10"

While you can often tell if something is an S3 method by it being a generic followed by a . and then an object name, don't rely on this, because people often use . to separate words in functions that can make them look like S3 methods when they're not.

The important thing I hoped you've gleaned from this explanation is that you don't really need to know what your object is to use a generic. Yes, there are generics that only have methods for specific objects and so it may be important to know what type your object is to make sure you can use a particular generic, R takes care of which method is dispatched for you. If that doesn't convince you that you don't need to understand the underlying structure of an object to get hands on with R, I don't know what will.

Note: In some languages, methods are essentially functions or sub-routines that are tied to an class. For instance, a class that represents a person's bank account might have the method Balance(), which will return how much money a person has in their account. There are certainly some similarities between how methods are used in R and some other languages, they are a bit different. Mainly, in R, methods are not attributes of a class but are separate functions.

4.3.4.2 Creating objects

The obvious question is "Can I make my own objects in R?". The answer is yes. And quite easily too. Whether you should or not is a different story. I'm not going to go over how to create an object in S3 or S4 or any other OO system here because there is more than enough for an entire book, but there are plenty of resources out there that can help you.