Overview

These materials provide a practical guide on analysing viral amplicon sequencing data for genomic surveillance, with a specific focus on SARS-CoV-2. While centered on SARS-CoV-2, the concepts and pipelines explored here are applicable to various viruses. The content includes the analysis of data from clinical isolates and wastewater samples. For clinical isolates, we illustrate how to create consensus sequences for upload to databases like GISAID and for downstream applications such as variant annotation and phylogeny. Wastewater sample analysis includes estimating variant and mutation frequencies. For both applications we will use a standardized bioinformatic pipeline compatible with both Illumina and Nanopore data. The materials cover assigning sequences to lineages, identifying variants of interest and creating visualizations to effectively communicate findings. Throughout, you will acquire foundational bioinformatic skills, including Unix command line usage and scripting for reproducible analyses.

Learning Objectives

Recognise the uses of genomic surveillance to inform public health actions during a pandemic.
Assemble high-quality SARS-CoV-2 genome sequences starting with raw sequencing data from clinical isolates.
Assign consensus sequences to lineages and identify variants of interest/concern.
Capture high-quality metadata, recognising its impact on downstream analyses.
Construct phylogenetic trees to contextualise new samples in a set of background samples.
Estimate variant frequencies from mixed wastewater samples.
Produce visualisations to communicate your findings and help inform public health action.

Target Audience

These materials are aimed at life scientists and molecular lab technicians interested in the bioinformatic analysis of viral genomic data. In particular, it will benefit those working in SARS-CoV-2 sequencing facilities, such as public health labs.

Prerequisites

We assume no prior bioinformatics experience or experience with the tools introduced in this course. An elementary knowledge of molecular and viral biology is assumed (concepts such as: DNA, RNA, PCR, primers, SNPs).

Citation

Please cite these materials if:

You adapted or used any of them in your own teaching.
These materials were useful for your research work. For example, you can cite us in the methods section of your paper: “We carried our analyses based on the recommendations in Tavares et al (2022).”.

You can reference these materials as:

Tavares H, Salehe B, Kumar A, Castle M & UKHSA New Variant Assessment Platform Team (2022) “cambiotraining/sars-cov-2-genomics: Introduction to Sars-CoV-2 Genomics”, https://cambiotraining.github.io/sars-cov-2-genomics

Or, in BibTeX format:

@Misc{,
  author = {Tavares, Hugo and Salehe, Bajuna and Kumar, Ankit and Castle, Matt and UKHSA New Variant Assessment Platform Team},
  title = {cambiotraining/sars-cov-2-genomics: Introduction to Sars-CoV-2 Genomics},
  month = {March},
  year = {2022},
  url = {https://cambiotraining.github.io/sars-cov-2-genomics},
}

Please make sure to include a link to the materials in the citation. (we will add a DOI in due time)

The contributing members from University of Cambridge Bioinformatics Training Facility team are:

Matt Castle, Bioinformatics Training Manager
Hugo Tavares, Senior Teaching Associate
Bajuna Salehe, Teaching Associate
Ankit Kumar, Teaching Assistant

The UKHSA’s NVAP Team members that supported these materials are:

Dr Leena Inamdar, NVAP Programme Lead and Global Health Lead
Dr Babak Afrough, Senior Project Manager
Aude Wilhelm, Senior Epidemiology Scientist
Richard Myers, Data Analytics Surveillance Head Bioinformatician
Sam Sims, Bioinformatician
Kate Edington, Bioinformatician
Constantina Laou, Specialist Lab Advisor

Acknowledgements

These materials have been developed as a collaboration between the Bioinformatics Training Facility at the University of Cambridge and the New Variant Assessment Platform (NVAP) program from the UK Health Security Agency.

Our partners also include COG Train. We also thank the wider community for publicly sharing training resources, including:

The workshop video series from CLIMB BIG DATA.
The Carpentries project, in particular for their Unix Shell lesson, which we adapted for this workshop.