Data analysis in R and Python

Author

Hugo Tavares and Martin van Rongen

Published

August 12, 2025

These sessions provide an introduction to coding in R and Python, focussing on data exploration and analysis. Both languages are leading in the field of data science and widely used for data visualisation, analyses, statistics and machine learning.

Both languages are open-source software and all the software we use during the course is free. We frequently run courses based on these materials, focussing on either R or Python (learning two languages at once is not ideal!).

These courses have been developed and are organised by the Cambridge Centre for Research Informatics Training (University of Cambridge, UK).

Have a look at our timetable to see when the next iteration is scheduled. We run in-person and online versions of these courses.

Courses are open to everyone.

Please see our guidelines for more details on eligibility and potential charges.

Learning objectives
  • Get familiar with the programming language.
  • Be able to use appropriate interface software.
  • Be familiar with different data types & structures and know when to use them.
  • Read in and investigate tabular data and perform basic quality control checks.
  • Create plots and be able to manipulate plot aesthetics.
  • Feel confident manipulating columns (variables in the data) and rows (observations in the data).
  • Perform operations on groups within your data.
  • Reshape data (long <> wide) and know when each format is appropriate.
  • Combine different tables of data, based on a common identifier.
  • Clean up common issues in data (column names, encoding issues etc).
  • Fine-tune plot settings to create publication-ready figures.

Target audience

The course is aimed at beginners, so no prior knowledge is required. If you already have some coding experience, but look to refresh your knowledge, this course is also for you. Different exercise levels will help challenge you at the appropriate level.

Prerequisites

No prerequisites.

Exercises

Exercises in these materials are labelled according to their level of difficulty:

Level Description
Exercises in level 1 are simpler and designed to get you familiar with the concepts and syntax covered in the course.
Exercises in level 2 combine different concepts together and apply it to a given task.
Exercises in level 3 require going beyond the concepts and syntax introduced to solve new problems.

Citation & authors

Please cite these materials if:

  • You adapted or used any of them in your own teaching.
  • These materials were useful for your research work. For example, you can cite us in the methods section of your paper: “We carried our analyses based on the recommendations in YourReferenceHere”.

You can cite these materials as:

Tavares, H., van Rongen, M. (2025). Data analysis in R and Python. https://cambiotraining.github.io/data-analysis-in-r-and-python/

Or in BibTeX format:

@misc{YourReferenceHere,
  author = {Tavares, Hugo and van Rongen, Martin},
  month = {6},
  title = {Data analysis in R and Python},
  url = {https://cambiotraining.github.io/data-analysis-in-r-and-python/},
  year = {2025}
}

About the authors:

  • Hugo Tavares
    Affiliation: Cambridge Centre for Research Informatics Training
    Roles: writing - original draft; conceptualisation; software
  • Martin van Rongen
    Affiliation: Cambridge Centre for Research Informatics Training
    Roles: writing - original draft; conceptualisation; software

Acknowledgements

Some parts of these materials are loosely based on the original course contents of the “Data Carpentry lesson in Ecology”, as released by Michonneau et al. (2019).