14  Visium HD: Zebrafish Head

TipLearning Objectives
  • Load a Visium data set into Seurat
  • Preprocess the data (normalisation, scaling, quality control)
  • Perform clustering and visualise results
  • Identify marker genes for clusters
  • Run BANKSY analysis
  • visualise BANKSY results
  • Identify spatially variable features
  • Visualise spatially variable features on spatial maps

14.1 Loading Visium Data into Seurat

To load Visium data into Seurat, you can use the Load10X_Spatial function. This function reads the Visium data from a specified directory and creates a Seurat object. In this example we are loading a Visium HD 3’ dataset. Seurat supports loading 3’ datasets through a developmental feature. Please note that this is still experimental and may not work for all datasets. We need to load additional libraries to handle this type of data. We are going to use a dataset from 10x Genomics, which can be found here. This is a Visium HD 3’ dataset of a zebrafish head section. It comes with pre-segmented data, which we will use in this case to avoid using binned data and having to deconvolute it. Please make sure the relevant directories have been unzipped. Make sure that both the binned_outputs and spatial directories are uncompressed as well as the segmented_outputs directory for pre-segmented high-resolution data. To keep our dataset small enough to run on a standard laptop, we will only load the segmented polygons data.

# Load libraries
library(Seurat)
library(ggplot2)
library(patchwork)
library(dplyr)
library(Matrix)
library(SeuratWrappers)
library(Banksy)

# Load the Visium HD 3' dataset using pre-segmented polygons
v3d <- Load10X_Spatial(
  data.dir = "data/zebrafish_head_visium_hd/",
  slice = "slice1",
  bin.size = "polygons"
)

14.2 Filtering and Quality Control

After loading the data, we can perform some basic quality control (QC) to filter out low-quality spots. We will calculate the percentage of mitochondrial genes and filter out spots with high mitochondrial content, as well as spots with very low total counts. because we are working with polygons, we will use the nCount_Spatial.Polygons metadata field instead of nCount_Spatial and we are adjusting the filtering thresholds accordingly as well.

# Calculate mitochondrial percentage and compare QC metrics before and after filtering
v3d[["percent.mt"]] <- PercentageFeatureSet(v3d, pattern = "^mt-")

# Visualise QC metrics
VlnPlot(v3d, features = c("nCount_Spatial.Polygons", "percent.mt"), ncol = 2)

# Remove low-quality cells and check the difference in metrics
v3d <- subset(v3d, subset = nCount_Spatial.Polygons > 5 & percent.mt < 10)

# Visualise QC metrics again to check improvement
VlnPlot(v3d, features = c("nCount_Spatial.Polygons", "percent.mt"), ncol = 2)

14.3 Normalisation and Scaling

Next, we will normalise and scale the data using the SCTransform function. This function performs normalisation and variance stabilisation. We are also reducing the number of cells used for normalisation to speed up the process and reduce memory requirements and adding a parameter to conserve memory.

# Run SCTransform normalisation (this may take a few minutes)
v3d <- SCTransform(
  v3d,
  assay = "Spatial.Polygons",
  new.assay.name = "Polygon",
  conserve.memory = TRUE,
  variable.features.n = 1000,
  ncells = 2000
)

# Check that the default assay is now Polygon
DefaultAssay(v3d)

14.4 Dimensionality Reduction and Clustering

After normalisation, we can perform dimensionality reduction using PCA and then cluster the spots using the Leiden algorithm. We will visualise the clusters using UMAP.

# Run dimensionality reduction and clustering on the Polygon assay
v3d <- RunPCA(v3d, assay = "Polygon", verbose = FALSE)
v3d <- RunUMAP(v3d, dims = 1:30)
v3d <- FindNeighbors(v3d, dims = 1:30)
v3d <- FindClusters(
  v3d,
  algorithm = 4,
  resolution = 0.3,
  cluster.name = "leidenClusters_03"
)

# Visualise clusters in UMAP and spatial coordinates
DimPlot(v3d, reduction = "umap", group.by = "leidenClusters_03", label = TRUE) +
  ggtitle("Leiden Clusters (res=0.3)")
SpatialDimPlot(v3d, group.by = "leidenClusters_03", label = TRUE) +
  ggtitle("Leiden Clusters (res=0.3)")

14.5 Subsetting the Object

Since this is a rather large dataset, we will subset it to only include roughly the eye of the zebrafish head for the rest of the analysis. We can do this by subsetting using an interactive plot. Please try selecting the innermost part of the eye - including the lens and one of the surrounding circular tissue layers. Unfortunately a larger selection might make it impossible for you to run the detection of spatially variable features later on in a reasonable timeframe.

# Select eye-region cells interactively from the spatial plot
selected_cells <- InteractiveSpatialPlot(v3d)

# Subset the object to keep only the eye region
eye <- subset(v3d, cells = selected_cells)
SpatialFeaturePlot(
  eye,
  images = "slice1.polygons",
  features = "nCount_Spatial.Polygons",
  plot_segmentations = TRUE,
  crop = TRUE
)

This data now has to be preprocessed again after subsetting.

# Re-run preprocessing and clustering after subsetting
eye <- SCTransform(
  eye,
  assay = "Spatial.Polygons",
  new.assay.name = "Polygon",
  conserve.memory = TRUE,
  variable.features.n = 1000,
  ncells = 2000
)

# Check that the default assay is now Polygon
DefaultAssay(eye)

# Run dimensionality reduction and clustering again on the subsetted data
eye <- RunPCA(eye, assay = "Polygon", verbose = FALSE)
eye <- RunUMAP(eye, dims = 1:30)
eye <- FindNeighbors(eye, dims = 1:30)
eye <- FindClusters(
  eye,
  algorithm = 4,
  resolution = 0.5,
  cluster.name = "leidenClusters_05"
)

# Visualise clusters on the subsetted data
DimPlot(
  eye,
  reduction = "umap",
  label = TRUE,
  label.size = 5,
  group.by = "leidenClusters_05"
)
SpatialDimPlot(
  eye,
  images = "slice1.polygons",
  plot_segmentations = TRUE,
  crop = TRUE,
  group.by = "leidenClusters_05",
  label = TRUE,
  label.size = 5
)

14.6 Identifying Marker Genes

We can identify marker genes for each cluster using the FindAllMarkers function. This function identifies genes that are differentially expressed in each cluster compared to all other clusters.

# Find marker genes for each Leiden cluster
eye_markers <- FindAllMarkers(
  eye,
  only.pos = TRUE,
  min.pct = 0.25,
  logfc.threshold = 0.25,
  pvalue.cutoff = 0.05
)

Additionally we want to see if a BANKSY analysis can help us identify spatial patterns in the data.

# Run BANKSY and cluster cells using the BANKSY representation
eye_banksy <- RunBanksy(
  eye,
  lambda = 0.8,
  verbose = TRUE,
  assay = "Polygon",
  slot = "counts",
  features = "variable",
  k_geom = 50
)
eye_banksy <- RunPCA(
  eye_banksy,
  assay = "BANKSY",
  reduction.name = "pca.banksy",
  npcs = 30,
  features = rownames(eye_banksy)
)
eye_banksy <- RunUMAP(eye_banksy, dims = 1:30, reduction = "pca.banksy")
eye_banksy <- FindNeighbors(
  eye_banksy,
  dims = 1:30,
  assay = "BANKSY",
  reduction = "pca.banksy"
)
eye_banksy <- FindClusters(
  eye_banksy,
  algorithm = 4,
  resolution = 0.3,
  cluster.name = "banksy_cluster"
)

# Visualise BANKSY clusters in UMAP and spatial coordinates
DimPlot(
  eye_banksy,
  reduction = "umap",
  label = TRUE,
  label.size = 5,
  group.by = "banksy_cluster",
  raster = FALSE
)
SpatialDimPlot(
  eye_banksy,
  images = "slice1.polygons",
  plot_segmentations = TRUE,
  group.by = "banksy_cluster",
  label = TRUE,
  label.size = 5
)

# Identify marker genes for BANKSY clusters
eye_banksy_markers <- FindAllMarkers(
  eye_banksy,
  only.pos = TRUE,
  min.pct = 0.25,
  logfc.threshold = 0.25,
  assay = "BANKSY",
  pvalue.cutoff = 0.05
)

14.7 Identifying Spatially Variable Features

We can also identify spatially variable features using the FindSpatiallyVariableFeatures function. This function identifies genes that show spatially variable expression patterns. We are going to reduce out input to the variables features only and only request the top 20 spatially variable features to be found. We will use the Moran’s I method for this analysis. This will take a while to compute, so we provide a pre-computed object that you can load for further exploration. You can see the code for running this analysis below, but we recommend that you load the pre-computed results for further exploration you may want to do on these data.

Code for finding spatially variable features - takes a long time to run!
# Identify spatially variable features using the Moran's I method
eye <- FindSpatiallyVariableFeatures(
  eye,
  assay = "Polygon",
  selection.method = "moransi",
  features = VariableFeatures(eye),
  nfeatures = 20
)
# Load a pre-computed object with spatially variable features
eye <- readRDS("precomputed/zebrafish_eye_svgs.rds")

You can now explore the identified marker genes and spatially variable features to gain insights into the spatial organisation of cell types and gene expression patterns in the zebrafish eye tissue.

# Visualise the top spatially variable features
top_spatial_features <- head(SpatiallyVariableFeatures(eye), 6)
SpatialFeaturePlot(
  eye,
  images = "slice1.polygons",
  features = top_spatial_features,
  plot_segmentations = TRUE,
  crop = TRUE
)

14.8 Conclusion

In this example, we have demonstrated how to load, preprocess, and analyse Visium spatial transcriptomics data using Seurat. We have performed subsetting, normalisation, dimensionality reduction, clustering, and identified marker genes and spatially variable features. Additionally, we have explored the use of BANKSY for identifying spatial patterns in the data. This workflow can be adapted and extended for other Visium datasets as well.

14.9 Summary

TipKey Points
  • Visium data can be loaded into Seurat using the Load10X_Spatial function.
  • Quality control is essential to filter out low-quality spots based on metrics like mitochondrial gene percentage.
  • Normalisation and scaling can be performed using the SCTransform function.
  • Dimensionality reduction (PCA, UMAP) and clustering (Leiden algorithm) help identify distinct cell populations.
  • Marker genes for clusters can be identified using the FindAllMarkers function.
  • BANKSY analysis can reveal spatial patterns in gene expression.
  • Spatially variable features can be identified using the FindSpatiallyVariableFeatures function.