Genome-Wide Association Studies (GWAS)

Published

May 16, 2025

Caution

These materials are still under development

Overview

Genome-Wide Association Studies (GWAS) investigate the genetic basis of complex traits and/or diseases. These materials cover the bioinformatic and statistical methods required to identify associations between genetic variants and traits. You will learn to use essential software for genotype data processing, including quality control crucial for downstream analysis. We discuss how population ancestry may impact association results and how this can be adjusted for in the analysis. We introduce key statistical concepts relevant to GWAS, with applications to both quantitative and binary traits. Finally, we introduce methods to assess potential biases in GWAS results and demonstrate how to generate effective visualisations.

Learning Objectives

Describe key concepts, advantages and limitations of GWAS.
Use PLINK to generate key metrics for quality control of samples and variants.
Recognise the effect of population structure when performing association tests and how to adjust for it.
Summarise the statistical methods used for association analysis and how to interpret their outcomes.
Run a GWAS for quantitative and binary traits and assess the quality of the results.
Visualise and report the findings of the association analysis.

Target Audience

Researchers and students interested in the genetics of complex traits.

Prerequisites

Knowledge of key genetics concepts and terms, such as: gene, locus, allele, linkage, inheritance, homozygous and heterozygous genotypes.
- See NIH’s genetics glossary for reference.
Knowledge of basic statistical concepts, such as: linear regression, null hypothesis testing, p-value, effect size. Knowledge of logistic regression is also desirable.
- See our Core Statistics and Generalised Linear Models materials as a reference.
Basic usage of the Unix command line: listing files (ls), moving between directories (cd) and an understanding of using options/flags with commands (e.g. command --input file.csv --output result.csv).
- See the “Basics” section of our Introduction to Unix command line materials.
Using R and the tidyverse package for data exploration and visualisation.

Citation & Authors

Please cite these materials if:

You adapted or used any of them in your own teaching.
These materials were useful for your research work. For example, you can cite us in the methods section of your paper: “We carried our analyses based on the recommendations in YourReferenceHere”.

You can cite these materials as:

Tavares, H., Laskar, R. (2025). Genome-Wide Association Studies (GWAS). https://cambiotraining.github.io/gwas

Or in BibTeX format:

@misc{YourReferenceHere,
  author = {Tavares, Hugo and Laskar, Ruhina},
  month = {3},
  title = {Genome-Wide Association Studies (GWAS)},
  url = {https://cambiotraining.github.io/gwas},
  year = {2025}
}

About the authors:

Hugo Tavares
Affiliation: Cambridge Centre for Research Informatics Training
Roles: conceptualisation; primary author; data curation; coding; software
Ruhina Laskar
Affiliation: Department of Oncology, University of Cambridge
Roles: conceptualisation; primary author; data curation; coding; software

Acknowledgements

List any other sources of materials that were used.
Or other people that may have advised during the material development (but are not authors).