3  Data

The data we use throughout all the sessions are contained in a single ZIP file. They are all small CSV files (comma separated values). You can download the data below:

Warning

The data we use throughout the course is varied, covering many different topics. In some cases the data on medical topics or historical events may feel uncomfortable to some, since they can touch on diseases or death.

All the data are chosen for their pedagogical effectiveness.

Some of the data have been synthesised using simulations. This allows us to highlight specific challenges you can encounter in real research data, without overcomplicating the analysis. For example, in real research data you will often have to deal with missing data, mislabelled data etc. Here, it would detract from what we are trying to achieve.

The code used to generate the synthesised data can be found here (Quarto markdown file).