1 teacheR (1st Edition)

Cover image

1.1 Overview

This book is a collection of training materials for an introduction to the R statistical computing programming language. Broken down into chapters, I’ve aimed to cover most of the basics.

The book is largely split into two sections. One section (“For Students”) that’s aimed at those that are entirely new to R. I explain the basics in a way that doesn’t require a background in data analysis or computing and you should have a decent understanding of the fundamentals of R if you manage to make it through.

In the second section (“For Teachers”), we look at some of the topics in greater detail, looking at the theory and specifics that underpin what we’ve learnt in the previous section. For those interested in teaching R to others, this section provides an introduction to the underlying workings of R that can be extremely helpful when questions from your students begin to arise. For example, we look at functions in both sections, however we cover the basics of what a function is and how to use one in the “For Students” section, and how to create functions in the “For Teachers” section.

I’m constantly looking to improve this book so feedback is very welcome. Anything from typos to content suggestions, feel free to raise a GitHub issue if you feel something should be changed.

1.1.1 Acknowledgements

This book was made possible with the help of those who raised issues and proposed pull requests. With thanks to: Swapnil Sengupta (@Swapnil-2001), @ARawles64

1.2 About Me

I began using R in my second year of university, during an internship looking at publication bias correction methods. I was under the tutorship of a member of staff who helped me immensely, but I must confess that I have never taken an official course in R, online or in person. I like to think, however, that this is not always a bad thing. Learning from the bottom up and struggling along the way is a fantastic way to acquire knowledge and instills a very important lesson:

You’re not going to know everything there is to know about R. Ever. But that’s okay.

I’m now 4 years into my R career and I use R every day. With that in mind, I don’t think there has ever been a day when I haven’t referred to a tutorial, or Stack Overflow, or even just Googled the name of a function that I’ve used 1000 times before. There is a great repository of knowledge for R and it’s one of the things I love most about the R community. So please never feel as though you’re an impostor in a world of R gurus. In reality, everyone else is just as lost as you. But if you keep ticking along and never feel that learning something new in R isn’t worth your time, you’ll end up doing some great things.

And in a roundabout way, that is part of the reason I decided to develop these materials. I don’t pretend to be the ultimate R programmer, because I still know what it’s like to learn something from the start. And everyone has to start somewhere. So I hope that I can help impart some of the lessons that I’ve learnt over the 4 years to anyone who’s looking to learn R in a way that won’t leave you feeling lost.

The only final note I have before we start learning how to use R is another bit of advice:

Don’t believe everything you read

Whilst this is probably a good thing to keep in mind for any type of training, I feel it’s particularly relevant with R for two reasons. Firstly, when it comes to programming languages, lots of people have opinions. Some are true, most are not. Most things you read are a mix between fact and opinion, so take everything with a pinch of salt. For example, the developers of the ggplot2 package are fervently against arbitrary second axes and so support for them in ggplot2 is limited. I also share this view, but that doesn’t mean that I’m right - read, learn, but question and make your own mind up.

Secondly, R and particularly all of its packages are prone to change. For this reason, people may make statements relative to one version of R that aren’t necessarily true in the future. Things have changed over the years, and so answers from a 10 year-old Stack Overflow question may not still be true when you come across them. A microcosmic version of this are some recent changes in the tidyr package. Historically, converting data from/to long and short formats was done using the spread() and gather() functions. However, in newer releases, these functions are deprecated in favour of pivot_wider() and pivot_longer(), which provide the same functionality but also some extra bits. The practical implication of this suggestion is don’t always read one tutorial on a subject before you dive in.

1.3 Why teacheR?

Most books are written as though you are a student. They teach you things so that you can pass a test or do whatever it was you need to do to move onto the next level or finish your project. This is very useful when you have a very specific goal in mind. For instance, people often get into R because they have a particular project they want to work on or because they think it could be useful for their work. Reading a book tailored to that type of project or work can therefore be very useful.

However, learning R this way, or learning anything for that matter, can present with some problems later down the line. Primarily, you’ve just learnt to do something very specific. There will certainly be tips and skills that will be transferable to future projects, but even then, once you’ve got a hammer, everything can start looking very nail-like.

And so I often try and take a different approach when learning and teaching R. Imagine that you’re learning R not as a means to an end, but because you’re going to teach it to someone else. This forces a more balanced and universal understanding, and can really help with your knowledge retention. When I began teaching R, my knowledge of R and how it works grew much faster than it ever had. And it was because of the teaching; people will ask insightful questions, ask you what happens when you do this or that and why does that happen, and those questions will force you to better understand R and how it works. It’s perfectly okay to say that you don’t know the answer when teaching, but when someone asks you why "TRUE" is apparently the same as TRUE but "True" isn’t the same as TRUE, I defy anyone who’s not curious enough to look it up afterwards.

So when you read this book, don’t think of yourself as a lowly student who’s learning R to pass their A Levels. Instead, think of yourself as a teacher in training.