Chapter 13 Why use R
Why is this chapter at the end of the book?
Well, partly I wanted to get you guys started with R as quickly as possible. Partly, I think the ‘why’ of R only starts to make sense when you learn about workflows. With this said, it is important to think about Why we should use open source software and open data science.
There’s no learning outcomes in this chapter. Instead, its more of a thesis, or more accurately, a blog post.
13.1 Why does open source matter?
If you’ve come to this book because I’m teaching you R in one of my courses, or as a PhD student of mine, then you have probably heard my thoughts on why science needs to become more transparent. ( If you haven’t, you can watch me talk about it here ).
I take the view that science is a philosophy, it is a way of thinking about the world. We use science to help us answer questions about the natural world. The process of collecting data and analysing data is not free from sources of bias. This is why reporting methodologies is such an important tenet of scientific process. Small variations in methodology can account for persistent effects that become reported as ‘truth’.
Mice, for example, are stressed by male handlers, which may have increased observed stress in many trials, as we rarely report the gender of the mouse handlers. Many medical studies focus only on male participants, resulting in treatments that have lower efficacy for women. Many medical treatments are calibrated with the wrong kind of statistical test, meaning that a true understanding of what being measured is often missing. And of course we have the replication crisis, a discovery that a majority of psychology results cannot be replicated in independent trials.
I believe that psychology is the tip of the iceberg, and that we will see more and more fields endure a replication crisis. There is a brilliant New York Time Magazine article about Amy Cuddy which I encourage all my students to read. It weaves a story of academic ambition, the fundamental issues with reproducibility in science, and the important personal relationships in science. I really recommend you read it.
The point of all this is that science is not an infallible process that arrives at an objective truth. It is a process that we need to discuss, a process we need to review, and a process we need to critique in order to better understand our world. Alongside this, there is a push in science to be more open and transparent about data. There is a difference between:
71 chicks were allocated into six groups, with each group given a different feed supplement. Their weight in grams after six weeks was compared in an ANOVA, and feed was found to significantly affect weight (F5,65 = 15.37, p < 0.001)
And specifiying exactly:
## Df Sum Sq Mean Sq F value Pr(>F)
## feed 5 231129 46226 15.37 5.94e-10 ***
## Residuals 65 195556 3009
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There are whole organisations dedicated to providing resources on open data and open practices in science.
I personally try to provide my data and my analyses for my papers. Its not always possible (sometimes the data is too sensitive to be shared, sometimes another member of a team is ideologically opposed, and that’s fair too), and I hope that if I make a mistake, it won’t be replicated throughout the scientific world. Mistakes can be very costly.
It is nervewracking though. Sharing your practice, how you arrived at a conclusion, can feel like you are exposing yourself to be found out for the imposter you really are. I have an awful lot of privilege in my life that allows me to accept this risk. See Amy Cuddy. Science needs to become kinder, and safer for all people, for these ideas to take hold.
13.2 So why R?
R is an open source software, meaning that absolutely anyone can download it, use it, and improve it. It also has a low barrier to entry in that there are lots of free resources to help you learn. A scientist in Scotland can directly collaborate with a scientist in Malawi, sharing code, sharing resources, and improving the scientific world.
R, and the way that we teach R and use R, is not free of biases. There are other programs, other methods, but I use R because I think it is a more transparent and accountable way of doing research.
And ultimately, that’s what I’m striving for.