RMarkdown: What Are You Waiting For?

Sep 17, 2019 BY Kari Alyse Leibowitz

In a test of the open science movement in psychology, Tom Hardwicke, who has been a postdoctoral scholar at the Meta-Research Innovation Centers at Stanford University and at Charité – Universitätsmedizin in Berlin, Germany embarked on a project that should have been easy. Working with Mike Frank, David and Lucile Packard Professor of Human Biology and Director of the Symbolic Systems Program in the Psychology Department at Stanford University, he set out to download the data and reproduce the analyses from a number of published papers. It should have been easy because these papers upheld the principles of open science by making their data freely available online, and all Hardwicke and Frank were trying to do was re-create the numbers in these papers: step 1 of reproducibility.

What they found surprised them. They were only able to successfully reproduce the same numbers for even simple statistics about 1/3 of the time: 2/3 of the time, they couldn't get the numbers to match. What's more, they found substantial discrepancies between their results and published results for about 1/3 of the papers, and, in many cases, not even the original authors themselves could figure out where the numbers in the papers came from.

While none of these errors affected the major inferential conclusions of the paper, this is a telling—and troubling—example. The process of running statistics and then inserting those statistical findings into a research paper contains many opportunities for errors: typos, cutting and pasting the right number into the wrong place, or rounding incorrectly. Surely there must be a way to cut down on these kinds of sloppy errors that are mucking up our science?

Enter RMarkdown. RMarkdown is a type of document created in RStudio that integrates written text with chunks of code, allowing researchers to compile written parts of the manuscript—such as explanations of results—with statistical outputs, tables, and figures. The resulting integration can then be rendered into Word, PDF, or HTML formats.

When used effectively, this means that RMarkdown eliminates the need for copying and pasting statistics into results sections or tables. Not only can this critically reduce the number of sloppy errors that occur when transferring dozens of different numbers from a statistical computing software to a manuscript written in a Word document, but it can also save researchers time from this tedious task. For example, if you suddenly realize that one of your participants failed an attention check and needs to be excluded, you can use RMarkdown to exclude this participant and re-run the code so that the numbers and tables in the manuscript all update accordingly, rather than re-running the code and then editing the manuscript by hand.

In some of his writing on this topic, Frank sums up the benefits of this approach nicely: "Often we tend to think of there being a tension between the principles of open science and the researcher's own incentive to work quickly. In contrast, this is a case where I think that there is no tension at all: a better, easier, and faster workflow leads to both a lower risk of errors and more transparency."

Of course, if you're not familiar with using R to conduct statistical analyses or have never written a manuscript or results section in RMarkdown before, there are start-up costs to learning a new system. But one benefit of R in general is that not only is the software itself free and open-source, but there are a ton of online resources for using R, including the holy grail manual R for Data Science (free to use through this link under a Creative Commons license) and an upcoming SPSP-sponsored video series on R data analysis for social and personality psychologists. And time spent switching workflows to use RMarkdown now is time saved later from copying and pasting statistical results into manuscripts.

RMarkdown won't solve all our reproducibility problems: there's no protection against coding errors, questionable research practices, or HARKing (hypothesizing after the results are known). But it is a solid first step: ensuring that our research papers are free from sloppy copy-and-paste errors is the least we can do. And for social and personality psychologists looking to improve or even overhaul their research practices to align with the open science movement, sometimes even a small step can be a big win.

RMarkdown: What Are You Waiting For?

Related News

Grad Student's Guide to AI: Top Platforms to Boost Your Academic Success (Part 2)

Grad Student's Guide to AI: Top Platforms to Boost Your Academic Success (Part 1)

Advances in AI: OpenAI, ChatGPT, and GPT-4 Industry Trends and News for Graduate Students

Related
News