2  Setup

2.1 Installation

Required software

  • Download R and install it using default options. (Note: choose the “base” version for Windows)
  • Download RStudio and install it using default options.

Setting up RStudio

After installing RStudio, change some of its default options (you only need to do this once):

  • From the upper menu go to Tools > Global Options…
  • Untick the option “Restore .RData to workspace on startup.”
  • Change “Save workspace to .RData on exit” option to “Never”
  • Press OK

For this course we’ll be using Visual Studio Code. This provides support for various programming languages (including Python and R). It works on Windows, MacOS and Linux. It’s also open-source and free.

Please refer to the installation instructions and make sure that you verify that Python code will run.

A brief sequence of events:

  1. Install Visual Studio Code
  2. Install the VS Code Python extension
  3. Install a Python interpreter
    • Windows: install from Python.org or use the Microsoft Store
    • MacOS: install the Homebrew package manager, then use this to install Python
    • Linux: comes with Python 3, but needs pip to install additional packages

2.2 Packages

We will be using the following packages throughout this course:

Install the required packages. Run the following code in the console:

install.packages("tidyverse")
install.packages("rstatix")
install.packages("ggResidpanel")

Testing your installation

On the RStudio panel named “Console” type library(tidyverse) and press Enter

A message similar to this should print:

── Attaching packages ─────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.2.1     ✔ purrr   0.3.2
✔ tibble  2.1.3     ✔ dplyr   0.8.3
✔ tidyr   1.0.0     ✔ stringr 1.4.0
✔ readr   1.3.1     ✔ forcats 0.4.0
── Conflicts ────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

If instead you get the message:

Error in library(tidyverse) : there is no package called ‘tidyverse’

then your package installation did not work. Please ask the instructors for assistance before the course.

NumPy

The numpy package provides fundamental data science functionality to Python. For more information see: https://numpy.org/doc/stable/#

It can be installed via pip

pip install numpy

or conda

conda install -c conda-forge numpy

pandas

The pandas package provides data structures to Python. For more information see: https://pandas.pydata.org/docs/getting_started/install.html.

It can be installed via pip

pip install pandas

or conda

conda install pandas

pingouin

The pingouin package provides statistical functionality to Python. For more information see: https://pingouin-stats.org.

It can be installed via pip

pip install pingouin

or conda

conda install -c conda-forge pingouin

patchworklib

The patchworklib package provides an easy way for assembling figures. This package is required to run the course-specific dgplots() function. For more information see: https://pypi.org/project/patchworklib/.

It can be installed via pip

pip install patchworklib

plotnine

The plotnine packages provides a grammar of graphics to Python - an equivalent to the ggplot2 package in R. For more information see: https://plotnine.readthedocs.io/en/stable/#.

It can be installed via pip

pip install plotnine

or conda

conda install -c conda-forge plotnine

2.2.1 scikit-posthocs

The scikit-posthocs package provides post-hoc functionality. For more information see: https://scikit-posthocs.readthedocs.io/en/latest/

It can be installed via pip

pip install scikit-posthocs

2.2.2 statsmodels

The statsmodels package provides statistical functionality. For more information see: https://www.statsmodels.org/stable/index.html.

It can be installed via pip

pip install statsmodels

or conda

conda install -c conda-forge statsmodels