Chapter 2 Getting Started

You can skip this chapter if you can:

  • Use the console to input simple lines of code

  • Install and load a package

  • Recognise a function

To use this chapter you will need a working version of R and R Studio.

If you can open R Studio and type 2 + 2 in the console you are good to go.

If that is confusing, you might need to jump back to installing R Studio and the section navigating R Studio.

2.1 The Big Secret

Okay. This is the Big Secret. The thing you will not believe. The most important thing you’ll learn about R.

You should copy and paste other peoples’ code.

Yes, I’m serious. You should copy and paste and edit code you find online2. In this book you should always be copying bits of code and pasting it into your console. That is why I publish the code alongside each question and demonstration.

It took me about ten years of working in R to get over my fear of copying and pasting other peoples’ code. Now I even have a repository of all the code I use over and over so I can copy and paste my own code.

Most people will not remember code off the top of their head. This is why we use the help function (try typing ?summary to see what it says) to look at the documentation, and resources like Stack Exchange and Google to help us (more in troubleshooting).

2.1.1 What about AI like ChatGPT?

Large Language Models like ChatGPT and Google Bard can also be applied to coding problems. For example, you could ask ChatGPT how to rearrange a dataset. This can be really helpful, but as with all things AI related, you need to be very cautious and test the answer to make sure it’s right. Often it doesn’t choose the smartest way.

The use of Large Language Models in coding is a subject of debate. Stackoverflow (a big community resource) currently does not allow any ChatGPT answers/questions on its site.

At present, Edinburgh has a policy for the use of ChatGPT and similar tools here. It boils down to: don’t claim it as your own work and be cautious of its reliability, accuracy, and the legality of your question.

Whenever we talk about the use of Large Language Models I think its really important to remember that they boil down to fuzzy jpgs of the web some time ago. That’s a really good article by the way.

2.1.2 Exercise

You might be very skeptical here about how much I want you to be copying code. If you’re not convinced, I suggest you go watch my RStats Crush the amazing David Robinson do one of his Tidy Tuesday screencasts. David Robinson has forgotten more R than I’ll ever learn, and see - he still copies and pastes code.

2.2 Projects

I would like to begin by creating a new project in R Studio.

  • Open R Studio

  • Go to File > New Project

  • Set up a new project either in an existing directory (folder), or in a new one. Ignore repositories for now.

You can call this project anything you like. I recommend something short like ‘start’.

A R Project sits inside a folder. Everything inside that folder is part of that R Project. Let’s say you have a folder called:

mydata

and you create an R Project called analysis.

The analysis.Rproj will live inside the mydata folder, and it will ‘see’ all the data, files, and any other folders, you have in mydata.

Using R Studio you will get a lot better at ‘directory management’, or knowing where you have saved things.

2.3 Environment

In your brand new project you will have a clean environment (refer back to Figure 1.5). Our first exercise is going to explore what the environment really means.

2.3.1 Exercise - Environment

When we first installed R, you were typing mathematical equations into the console. Let’s remind ourselves of this by running an equation now. In the console, type a long and complicated equation, then hit ‘enter’ when you’re done.

500 + 23 / 91
## [1] 500.2527

If this was a calculation we often ran, we might want to save the answer. We can do this by using the assign symbol <-.

For example, if we type this:

x <- 500 + 23 / 91

… you will see that we simply get a new line returned. We don’t get the answer. However, in your environment you will see under the heading ‘Values’, a new thing called x.

Enter the following code:

x
## [1] 500.2527

R remembers what x is and recalled it. This is useful if we want to do more with x after we’ve calculated it.

x /100
## [1] 5.002527

As well as values, we can ask R to remember a string of text as well. Try this:

hello <- "Computers always print 'Hello World!' but we never say hello back"


print(hello)
## [1] "Computers always print 'Hello World!' but we never say hello back"

But R is very fussy. What happens if you run this code:

print(HELLO)

You see, R is case sensitive, meaning we have to be careful to always type things the same. This is one of the reasons why R Studio is really handy, because you can see what you’ve saved in your environment.

Now type this:

hello <- "Hello, R!"

Take a guess what is going to happen before you type the next line. Were you right?

print(hello)
## [1] "Hello, R!"

2.3.2 Exercise - Functions

2.3.2.1 The Basic Function

A function is a handy way to bundle together some lines of code. David Robinson says that if you run a few lines more than twice you should write a function to do it instead.

I’m going to contradict myself now. Functions are really important in R, but I’m not going to ask you to write one anywhere else but in this exercise. Most things in this book I expect you to practice over and over, but functions are useful to understand packages, and that’s why we’re going to talk about them now.

Let’s create a function to make R welcome us. Copy and paste this code into a new script file (you may want to save it as your functions example), and then run it.

mynameis <- function (name) {
  print (paste0("Hello, ", name, ", how are you today?"))
}
mynameis ("Jill")
## [1] "Hello, Jill, how are you today?"

What is this code saying?

Create a new thing called mynameis

Give (<-) mynameis the function purpose. Everything inside the curly brackets {} is the function.

The function should make a new thing called name.

The function should print the result of pasting a string (paste0) of things, the words Hello,, whatever you said name should be, and , how are you today?.

You may have noticed that, just like when we were assigning x a value, we now have something in our environment called mynameis.

Whenever we type mynameis("Jill") into the console, or execute that line in a script file, R knows it should look to the environment and run the code that’s been bundled up into that package.

For example, we could make R welcome us in exactly the same way by typing:

print (paste0("Hello, ", "Jill", ", how are you today?"))
## [1] "Hello, Jill, how are you today?"

But we would need to change the name every time we wanted welcome a new user. We are lazy, so we use the function to reduce the amount of things we need to change to get a different outcome.

Functions make our code more standardised. If we all define the mynameis function at the start of our documentation, we don’t need to worry about accidentally deleting an important character in the code. This is especially important as we like to copy-paste code when we need to.

Most scientists will probably not write their own functions, but you should know how they work.

2.4 Packages

If people write very good and useful functions, they usually want to share them. They can do this with packages.

There are packages for everything, packages for drawing maps, packages for analysing data, packages for suggesting how to analyse your data, packages to give you more data, packages for tidying your data . . . this is one of the reasons we might all have different bits of code. There can be lots of ways to do something.

2.4.1 Exercise - Install and Load a Package

Packages work in two stages. The very first time you need a package you need to install it. Then, every time you start R you need to load the package into your library.

First you install. Next you load into library.

To install a package you need to type install.packages("package_name").

To load it into your library you need to type library(package_name).

When you load a package from your library it goes into a special hidden environment where you can access all the functions that the package has written for you.

For example, we will make great use of the tidyverse package in this book. You need to start by installing it with:

install.packages("tidyverse")

While the package is downloading and installing you may see a little red ‘stop’ sign in the top right hand corner of your console window. This means R is working, and you shouldn’t add any new commands until it has finished and the new line symbol appears again in the console (>).

When it has finished downloading & installing, you need to type:

library(tidyverse)

to load it. You’ll see that nothing has appeared in your Environment, but actually you have a whole host of new functions & data you can play with. For example:

slice(starwars)

Which should give you this:

## # A tibble: 0 × 14
## # ℹ 14 variables: name <chr>, height <int>, mass <dbl>, hair_color <chr>, skin_color <chr>, eye_color <chr>, birth_year <dbl>, sex <chr>,
## #   gender <chr>, homeworld <chr>, species <chr>, films <list>, vehicles <list>, starships <list>

Now we have a new dataset (starwars) and a new thing we can do (slice the top of the data). We will talk much more about tidyverse in the data chapter.

You will often get warnings in R when you type these code blocks. To keep the book tidy, I have suppressed (hidden) the warnings in this book. Don’t worry about them - they’re usually just trying to keep packages compatible with older versions of R

2.5 R Markdown vs R Scripts

In R Studio you can use R Script files and R Markdown files to save your work. Both of these files can be created, edited, saved, and opened in your R Project file. You can send R Script and R Markdown files to other people, so they can edit and share your code too.

If you are reading this book as part of coursework I’ve set you, you will likely have an assignment where you neeed to download and edit an R Markdown file, then send it back to me.

2.5.1 R Scripts

R script files are great for testing out small chunks of code on their own.

Go to File > New File > R Script or press Ctrl + Shift + N to open one.

Everything you type in an R script file is executable code.

Unlike working in the console, you can type multiple lines of code in an R script, and run each one by hitting ctrl + enter.

You can’t type plain text in an R Script file. Text has to be annotated by the comment symbol (#)

# In an R script file - you can write comments
# Which always begin with a hashtag

# Comments can be useful to say things like what date you wrote the script on,
# or what you were trying to do. 

print("You can also use script files to write multiple lines of code")

print("and edit those lines of code")


# before you run them by pressing ctrl + enter at the same time 
# when your cursor is on that line. 

2.5.2 R Markdown

R Markdown allows you to combine code with actual text. This book was written in R Markdown

In R Markdown you can combine normal text with code blocks, and R knows to only treat the text inside the code blocks as code.

An example of R Markdown with text and code

Figure: 2.1: An example of R Markdown with text and code

There is a great R Markdown cheatsheet from R Studio here. I had this printed out and hung above my desk for years.

2.5.3 A video on R Markdown

If you would like to watch me using R Markdown, you’re in luck! Theres a video here.

2.6 Video Introductions

There is a short introduction to R and R Studio in Video format here:


  1. If you’re reading this as part of your coursework you might be panicking about plagiarism here, after all, we spend a lot of time telling you plagiarism is the worst thing you could ever do and that we’ll use software to detect it. Code is a bit different. We are always trying to find the most efficient way of doing something, and so ideally you would all write code that was identical. Sadly, humans naturally differ in the way the think about problems. My job would be a lot easier if everybody thought the same. If I have set you this book as reading, I can swear to you I will never put your code through a plagiarism checker. That would be very stupid.↩︎