skip to primary navigation skip to content

Reproducibility without fuss

4 Documentation

4.1 Objectives

  • Be able to use basic markdown syntax
  • Understand how to use R Markdown by integrating text formatting and R code
  • Be able to create dynamic tables in R
  • Know how to compose multipanel figures programmatically in R

Part of these materials were originally developed by Alexia Cardona1.

4.2 Libraries

Libraries Description
tidyverse A collection of R packages designed for data science
rmarkdown Converts markdown and R code into rendered documents
kableExtra Generates dynamic tables
patchwork Composes plots generated by ggplot()

4.3 RMarkdown

R Markdown is integrated in RStudio and enables you to create documents, reports, presentations and dashboards. This in turn helps you to communicate your research better than using scripts alone. It does this because it supports extensive text formatting and easy code integration. That in turns enables you to walk a reader through an analysis much more easily than you would be able to do using comments in a script.

An R Markdown file is made up of 3 basic components:

  • header
  • markdown
  • R code chunks

Here we will assume that the output of our report is an .html file. HTML files are files for web pages. This means that the report that we will generate can be easily deployed on the web.

R Markdown can also be used to generate PDFs, Microsoft Word or Latex documents, among other things.

4.3.1 R Markdown header

The markdown document starts with an optional header in YAML (YAML Ain’t Markup Language) format known as the YAML metadata. Oh yes, programmers have a very particular sense of humour…

In the example below, the title, author and date are specified in the header. Other options can be specified in the header such as table of content which we will look at later on in the course.

---
title: "Reproducibility without fuss"
author:  "Martin van Rongen"
date: 20 Jul 1969
---

To create a new R Markdown file go to File -> New File -> R Markdown. To generate the output file (in this case an html document), press the knit button:

Alternatively, use the corresponding keyboard shortcut (Mac Cmd + Shift + K, Windows Ctrl + Shift + K).

4.4 Markdown

The text following the header in an Rmarkdown file is in Markdown syntax. This is the syntax that gets converted to HTML format once we click on the Knit button or the Build website button (yes, you can deploy whole websites using R Markdown). The philosophy behind Markdown is that it should be easy to write and easy to read.

The full documentation of the Markdown syntax can be found at https://pandoc.org/MANUAL.html. However this might be a bit of information overload when you’re starting. Therefore below is a simplified version of the Markdown syntax.

4.4.1 Headings

Below is the Markdown code you need to use to specify headings at different levels and the rendered output respectively below the code:

# Heading 1

Heading 1

## Heading 2

Heading 2

### Heading 3

Heading 3

#### Heading 4

Heading 4

4.4.2 Inline text formatting

To make text bold use **double asterisks** or __double underscores__.
To make text italic use *asterisks* or _underscores_.
To make text superscript use ^caret^.
To make text subscript use ~tilde~.
To mark text as inline code use `backticks`.
To strikethrough text use ~~double tilde~~.

4.4.3 Line breaks

To create a line break, put more than 2 spaces at the end of a sentence or place \ in a new line followed by a new line .

4.4.5 Lists

4.4.5.1 Ordered lists

To create an ordered list use the following syntax:

1. Item 1
2. Item 2
3. Item 3

This is rendered as:

  1. Item 1
  2. Item 2
  3. Item 3

Use 4 spaces to indent an item if you would like to have sub-lists:

1. Item 1
2. Item 2
3. Item 3
    a. Item 3a
        i.  Item 3ai
        ii. Item 3aii
    b. Item 3b
    c. Item 3c
4. Item 4
  1. Item 1
  2. Item 2
  3. Item 3
    1. Item 3a
      1. Item 3ai
      2. Item 3aii
    2. Item 3b
    3. Item 3c
  4. Item 4

4.4.5.2 Unordered lists

In an unordered bullet list, each item begins with *, + or -. Example:

* Item 1
* Item 2
* Item 3
    * Item 3a
        *  Item 3ai
        * Item 3aii
    * Item 3b
    * Item 3c
* Item 4

Will be rendered as:

  • Item 1
  • Item 2
  • Item 3
    • Item 3a
      • Item 3ai
      • Item 3aii
    • Item 3b
    • Item 3c
  • Item 4

4.4.5.3 Tasks list

Tasks list can be done using the following syntax:

- [ ] an unchecked task list item
- [x] checked item

This will be rendered as:

  • an unchecked task list item
  • checked item

4.4.6 Inserting images

To insert an image use the following syntax: ![Figure caption](path/to/image). Example:

![R Logo](img/Rlogo.png)

Will be rendered as:

R Logo
R Logo

You can adjust the size of the image by adding a width and height in pixels in between curly brackets, like this:

![R Logo](img/Rlogo.png){width=100px height=100px}

will plot like

R Logo
R Logo

4.4.7 Blocks

Blocks or quotes in the notebook can be created by using the > sign as follows:

> "Life can be great, when you try something new."
>
> _Kevin the koala_

This is rendered as:

If you would like to add code blocks, use ``` before and after the code as follows:

```
print("Hello world")
x <- 1 + 2
print(x)
```

This will be rendered as:

print("Hello world")
x <- 1 + 2
print(x)

 

4.4.8 Adding a table of contents

To add a table of contents to your report add the following to the YAML header you need to specify the output (in this case a html_document) and set toc to true. If you want to make sure your table of contents stays visible even whilst scrolling, then you need to float it, using toc_float: true.

output:
  html_document:
    toc: true
    toc_float: true

By default all headings up to level 3 headings are displayed in the table of contents. You can adjust this by specifying toc_depth as following:

output:
  html_document:
    toc: true
    toc_depth: 4

4.4.9 Adding references

Adding references and citations in Markdown is not as easy as reference manager software such as Mendeley. To be able to create citations you will need to create a bibliography file with all the references in it. Here is an example of a bibliography file. The bibliography file has to be placed in the same folder as the one where the .Rmd file is. Next, add the following to the YAML header:

bibliography: references.bib
link-citations: yes

Your YAML header should now look like:

---
title: "Reproducibility without fuss"
author:  Martin van Rongen
date: 19 Dec 1972
bibliography: references.bib
link-citations: yes
---

To cite a reference use the @ together with the ID of the reference. Example:

Citation to my paper @cardona2014 and @cardona2019

Will be rendered as:

Citation to my paper L. A. A. Cardona Alexia AND Pagani (2014) and A. Cardona et al. (2019)

To add the bibliography at the end of the report add a References heading at the end of the report:

# References

See https://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html for more information.

4.5 Exercise: Getting started

Exercise 4.1 To get you started, I’d like you to do the following:

  1. Create an R Project if you haven’t done so already
  2. Start a new R Markdown file and save it in the appropriate subfolder (e.g. notebooks)
  3. Think about one of your own projects and use that to come up with a title.
  4. Create three levels of headings, sub-dividing your project and try out some of the inline text formatting.
  5. Add a link to the https://www.gapminder.org website (or any other website).
  6. Use knit to render an html document and see if it looks like you expected.

4.6 R code chunks

So far we have not used any R code and all the code we used so far is in Markdown. As the name suggests, Rmarkdown files contain markdown and R. Rmarkdown files have a .Rmd extension. Using R in a markdown document makes sense if we integrate our analysis in our report. Some operations that we might need to do are; loading our data set, performing some operations on the data set and displaying results, either in a table or in a plot. We will be doing all this in the following sections.

If you would like to insert R code inside the notebook you will have to insert it inside an R code chunk to be able to execute it. To do this click on the Insert button at the top of the Source panel in RStudio:

Or use the associated shortcut (Mac Cmd + Option + I, Windows Ctrl + Alt + I).

This creates an R code chunk as follows:

Alternatively, you can type the r code chunk (although I recommended using the keyboard shortcut). The R code should be placed in between the triple backticks. Note that on the right hand side of the R code chunk there is a green “play” button that will run the R code chunk if pressed. By default, when the Rmarkdown document is knitted, the R code will be executed and the R code chunk is displayed before the executed code.

To run a chunk you can use the Run button or the shortcut (Mac Cmd + Shift + Enter, Windows Ctrl + Shift + Enter).

4.6.1 Chunk names

It is possible to give names to chunks using ```{r chunk_name}. This has several advantages:

  • You can navigate more easily through your R Notebook using the drop-down code navigator:
  • Graphics produced by the chunks will have meaningful names based on the chunk name, which is a big improvement over Untitled1.png.

  • One chunk name has special meaning: setup. In R Notebooks the chunk named setup will automatically run only once, before any other chunks are run.

4.6.2 Chunk options

You can control the output of chunks with different options in the chunk header (e.g. {r, include = FALSE}).

These chunk options include:

  • eval = FALSE prevents code from being evaluated. No code is run and no output displayed. Useful to display example code or avoiding commenting out many lines.

  • include = FALSE do not display the code and results in the page after it is knitted. The R code however still runs and therefore the variables or results in this code chunk can be used by the other chunks.

  • echo = FALSE does not display the code, but it displays the results in the rendered file.

  • message = FALSE or warning = FALSE prevents messages or warnings from being displayed in the rendered file.

There are almost 60 options by knitr that you can use to customise your code blocks. A full list of options can be found here.

4.7 Tables

Tables can be created manually or dynamically.

4.7.1 Manually creating tables

Although probably only useful for small tables, you can manually create tables, using |and - as follows:

| country_id  | country     |
| ----------- | ----------- |
| afg         | Afghanistan |
| ago         | Angola      |

This is rendered as:

country_id country
afg Afghanistan
ago Angola

Table alignments can be done using the following syntax:

| country_id (left) | country (centre) | world_region (right) |
| :---              | :----:           | ---:                 |
| afg               | Afghanistan      | south_asia           |
| ago               | Angola           | sub_saharan_africa   |

This is rendered as:

country_id (left) country (centre) world_region (right)
afg Afghanistan south_asia
ago Angola sub_saharan_africa

4.7.2 Dynamic tables using rmarkdown

Clearly creating a large table this way is not very efficient or often even feasible when dealing with larger data sets. Furthermore, it goes against the principle of trying to make your research reproducible because the data are not generated through code.

R Markdown has some basic functionality to create tables in HTML. The paged_table() function creates a table that supports paging rows and columns. When we use this on the built-in iris data set it looks like this:

paged_table(iris)

4.7.3 Dynamic tables using kableExtra

Apart from the basic functions available in rmarkdown, we can also use the kableExtra package, which uses the knitr::kable() function to generate tables. It allows you to build complex tables and adjust the table styles easily.

More information can be found here.

Let’s explore that a bit more by generating a table that displays the iris data set.

The table is styled in twitter-fashion using kable_styling(), with striped rows that highlight when you hover over them.

The table is embedded in a scroll box of a fixed dimension using scroll_box.

Here I’m using the pipe symbol (%>%) to pipe the output from one function to the next.

iris %>% 
  kable(caption = "Iris data set") %>% 
  kable_styling(bootstrap_options = c("striped", "hover"), 
                full_width = FALSE) %>% 
  scroll_box(width = "100%", height = "400px")

Somewhat frustratingly, pagination is not possible with kable. So if you have a long table, you are best off using the paged_table() function from the rmarkdown package.

4.8 Composing figures with patchwork

Apart from using a markdown language to render your text output, it can also be very useful to compose figures programmatically. Of course it is possible to generate figures with multiple panels using image editing software, but that does not really fit in with the reproducibility ethos, does it?

So how can we do this in R? There is a package that allows you to do just that for figures generated byggplot(): `patchwork’.

If you need to install it, run the following code in your console:

install.packages("patchwork")

Then load the library:

Patchwork requires you to assign plots to an object and then you can use mathematical operators such as +, |, / to organise your figures.

Let’s create three figures and go through the options. You can just copy/paste the code, because the figures themselves are not important. It’s the use of patchwork we’re focussing on.

Plots:

p1 <- iris %>% 
  ggplot(aes(x = Petal.Width,
             y = Petal.Length)) +
  geom_point()

p2 <- iris %>% 
  ggplot(aes(x = Species,
             y = Sepal.Width)) +
  geom_boxplot()

p3 <- iris %>% 
  ggplot(aes(x = Sepal.Length,
             y = Sepal.Width,
             colour = Species)) +
  geom_jitter()

4.8.1 Adding plots together

The simplest way to add plots together using patchwork is to use the + operator:

p1 + p2 + p3

You can control how the plots are displayed. For example, we can put the third plot (p3 below the others):

(p1 + p2) / p3

Or we can stretch out the first plot over two rows:

p1 | (p2 / p3)

We can then further customise the plot. We can add annotations using plot_annotation:

  • add a title using title =
  • add panel labels using tag_levels = (it takes different arguments, e.g. A for capital letters, a for lower case letters, I for Roman numerals etc.)
(p1 | (p2 / p3)) +
  plot_annotation(title = "Iris data set",
                  tag_levels = "A")

There are more options available, which you can find on the patchwork website.

4.9 Exercise: Writing a short report

Exercise 4.2 For this exercise I’d like you to create a new R Markdown file, and do the following:

  1. Create 3 heading levels, as previously done
  2. Add a table of contents
  3. Using the default mtcars data set, create
    1. a dynamic table
    2. create 2 plots in ggplot (e.g. scatter plot of disp vs hp; box plot of carb vs disp)
    3. create a two-panel figure in patchwork, with a title, caption and panel tag
    4. knit the document and admire your results

4.10 R Markdown limitations

Some limitations that R Markdown has:

  • Figure and table numbering is not supported. There is still a way around this, by using third-party packages e.g., captioner or bookdown (which is a great package to use when you want to group your analysis together in a book format!).
  • If you are looking at specific customisations that are different from the default Markdown format, e.g., headings using different font specifications, then this gets tricky and markdown might not be the best approach.

4.11 Key points

  • Markup languages allow you to perform document formatting in a programmatic way
  • Useful for large documents
  • Increases reproducibility of documentation