27 Pathogenwatch

Learning Objectives

Describe what Pathogenwatch is and how it can be used to aid in genomic surveillance.
Upload files and create collections from our assemblies.
Become familiar with the interface to Pathogenwatch.
Download and combine key results from Pathogenwatch.

27.1 Pathogenwatch

Pathogenwatch is a web-based platform for common genomic surveillance analysis tasks, including:

Identifying strains for pathogens of concern.
Cluster sequences using phylogenetic analysis.
Identifying the presence of antibiotic resistance genes.

Pathogenwatch is designed to be user-friendly, supporting the analysis of over 100 species, including Staphylococcus aureus, which is our organism of focus. In this chapter, we will cover the basics of loading genomes and creating collections for analysis on this platform. The details of the Pathogenwatch analysis will then be covered in following chapters.

In order to use this platform you will first need to create an account (or sign-in through your existing Google, Facebook or Twitter).

27.2 Uploading FASTA files

Once you have logged in to Pathogenwatch, you can load the FASTA files with the sequences you want to analyse. In our case, we will load the assemblies we have provided in the preprocessed directory which were built using assembleBAC.

Click the Upload link in the top-right corner of the page:

Click in the Upload FASTA(s) button, in the “Single Genome FASTAs” section:

If your internet connection is slow and/or unstable, you can tick “Compress files” and “Upload files individually”.Click the + button on the bottom-right corner to upload the sequences:

This will open a file browser, where you can select the FASTA files from your local machine. Go to the preprocessed/assemblebac/assemblies folder where you have the results from your earlier genome assembly analysis. You can upload several files at once by clicking and selecting several FASTA files while holding the Ctrl key. Click Open on the dialogue window after you have selected all of your FASTA files.

A new page will open showing the progress of the samples being uploaded and processed.

Click in the VIEW GENOMES button, which will take you to a tabular view of your samples:

Pathogenwatch performs the following major analyses useful for genomic surveillance: sequence typing (ST), antimicrobial resistance (AMR) analysis, phylogenetics, as well as reporting general statistics about your samples (such as genome completeness, which we also assessed with checkM2). We will detail several of these analyses in the coming chapters, but here is a brief description of each column:

Name - the names of the uploaded samples.
Organism - the species that was detected for our samples, in this case Staphylococcus aureus.
Type and Typing schema - the sequence type assigned to each sample, based on MLST analysis
Country and Date - the country and date of collection, respectively; only shown if we provided that information as metadata.
Access - whether these samples are private or public; in this case, because they were uploaded by us, they are private (only we can see them).

Metadata

If you have metadata files associated with your sequenced samples, you can upload those files following these instructions. Make sure all metadata files are in CSV format, with five recommended columns named ‘latitude’, ‘longitude’, ‘year’, ‘month’, and ‘day’. You can also use the template provided by Pathogenwatch on the upload page, to help you prepare your metadata files before the analysis.

Having this type of information is highly recommended, as it will allow you to visualise your samples on a map, which is useful if you want to match particular strains to the geographic locations where outbreaks occur.

27.3 Collections

A useful feature of Pathogenwatch is to group our samples into collections. This allows us to manage and analyse samples in batches of our choice. The same sample can exist in different collections. For example you might create a collection with only the genomes you sequenced recently, another collection with all the genomes you ever sequenced in your facility, or even a collection that includes your samples together with public samples available online (if you want to compare them with each other).

To create a collection from your sequences, check the box next to the “Name” header to select all of the uploaded genomes. Then, from the top-right of the table, click Selected Genomes –> Create Collection:

In the next window give a name and description to your collection:

It is highly recommended to provide details for your collection:

Title - give your collection a title that is meaningful to you, for example: “WBG 2023”.
Description - give a brief description of your samples, for example: “Culture-based sequencing of Staphylococcus aureus. Samples were collected from school children in Cambridgeshire in 2018.”
If your data come from a published study, provide a DOI of the study, for example: “10.1099/mgen.0.000993”.

Finally, click Create Now button to create your collection. You will be shown a table and map, with the samples you just added to the collection:

This table contains several columns:

Purple download button - download the assembled genome in FASTA format. This is the same file that you just uploaded, so it’s not very useful in our case. It can be useful if you want to download the public sequences available from within Pathogenwatch.
Green download button - download the gene annotation performed by Pathogenwatch in GFF format. Note that our assembly script already performed gene annotation using Bakta, so this feature is also not so useful for us. But again, if you were using public sequences from Pathogenwatch, you could download their GFF files.
NAME - your sample name.
ST and PROFILE - these columns refer to the “sequence type” (ST) assigned to each of our samples.- ST and PROFILE - these columns refer to the “sequence type” (ST) assigned to each of our samples.
INC TYPES - identification of plasmids relevant for Staphylococcal species (“inc” stands for “incompatibility”, refering to plasmid incompatibility groups). Inc plasmids often carry antibiotic resistance and virulence genes, making them of particular relevance for public health (e.g. Foley et al 2021).

Along with the typing, we can also look at the drug susceptibility profiles of the samples by clicking on the Typing button on the left-hand side of the screen and changing it to Antibiotics:

You will see a table containing the drugs that Pathogenwatch is able to identify resistance to using genetic variants identified in the genomes we uploaded. Resistance to a drug is shown by a red circle and we can see that the majority of our genomes are resistant to penicillin:

We will add some of this information to our phylogenetic tree in the next section.

Exercise 1 - Downloading data from Pathogenwatch

For the next step, visualising our phylogeny, you will need to download the results of the lineage typing and antibiotic susceptibility from Pathogenwatch:

Download the Typing table
Download the AMR profile.
Two CSV files will be downloaded. Rename the appropriate files to the following as this will help with the next exercise:
- wbg-2023-typing.csv
- wbg-2023-amr-profile.csv

Now, move these two files into the S_aureus analysis directory.

Answer

Click on the download icon in the top right-hand corner and select Typing table:

Click on the download icon again and select AMR profile:

Two files were downloaded (the longer names will likely be slightly different) to our Downloads directory:
- pathogenwatch-saureus-pi6kp4oqdawi-wbg-2023-typing.csv
- pathogenwatch-saureus-pi6kp4oqdawi-wbg-2023-amr-profile
We renamed the files on the command line (you could do this in the File Explorer too):

mv pathogenwatch-saureus-pi6kp4oqdawi-wbg-2023-typing.csv wbg-2023-typing.csv
mv pathogenwatch-saureus-pi6kp4oqdawi-wbg-2023-amr-profile wbg-2023-amr-profile.csv

We moved the files from the Downloads (or Desktop, depending on your browser settings) directory to our S_aureus directory. The following command assumes we were in the S_aureus directory to start with:

# if your browser downloads to "Downloads" folder:
mv ~/Downloads/wbg-2023* .
# if your browser downloads to the "Desktop" folder:
mv ~/Desktop/wbg-2023* .

Exercise 2 - Preparing data for Microreact

Now that we have analysed our genomes with Pathogenwatch and downloaded the typing and AMR profiles, we need to merge this metadata with the existing information we have for the 30 S. aureus genomes. We could do this with Excel. Alternatively, we provide a Python script called merge_staph_data.py in the scripts directory.

Make sure you are in the base software environment (where we have the Pandas library for Python).
Run merge_staph_data.py to create the final metadata file we need for Microreact. Look at the help documentation of this script to find out how to specify inputs and outputs to this script.

Hint

Depending on how they are written, most Python scripts will print the available options if you use the help flag (--help or -h):

python scripts/merge_staph_data.py -h

Answer

We activated the base software environment with mamba activate base

We ran the merge_staph_data.py script to create a TSV file called staph_metadata.tsv in your analysis directory:

python scripts/merge_staph_data.py -s sample_info.csv -t wbg-2023-typing.csv -a wbg-2023-amr-profile.csv

27.4 Summary

Key Points

Pathogenwatch is a web-based platform designed for genomic surveillance of bacterial pathogens. It assists in the analysis and interpretation of genomic data to monitor disease outbreaks and track pathogen evolution.
You can upload genome assemblies in FASTA format and accompanying metadata as CSV files.
Assemblies can be organized into collections, making it simpler to manage and analyze multiple samples together.
Pathogenwatch’s interface offers an intuitive user experience designed for users with varying levels of expertise, providing results such as biotype/serogroup, strain classification, antimicrobial resistance (AMR) and phylogenetic placement.
Downloading and combining the output files from Pathogenwatch as well as any metadata can be of use to annotate phylogenetic trees.