Spatially Derived Clusters

Author

Will Gammerdinger, Noor Sohail, Zhu Zhuo, James Billingsley, Shannan Ho Sui

Published

July 22, 2025

Approximate time: XY minutes

Learning Objectives

LO 1
LO 2
LO 3

BANKSY

BANKSY is another method for performing clustering. Unlike Seurat, BANKSY takes into account not only an individual bin’s expression pattern but also the mean and the gradient of gene expression levels in a bin’s broader neighborhood. This makes it valuable for identifying and defining spatial regions of interest.

To accomplish this, BANKSY calculates two different scores:

Weighted mean expression of genes in a spatial neighborhood
“Azimuthal gabor filter”, which is the “gradient of gene expression in each cell’s neighborhood”

In doing so, BANKSY is able to be flexible and not limit celltypes to only be located nearby one another. Even more than that, there is lambda value that scales the weight of each of these matrices to give the user flexibility in how strongly the spatial weights should impact clustering.

These new matrices are then concatenated to the counts matrix before running the remainder of a standard scRNA-seq workflow (dimensionality reduction, clustering, integration, etc.). Therefore, the final output of this algorithm is a modulated counts matrix that includes weighted scores that correspond to spatial domains.

library(Seurat)
# remotes::install_github("prabhakarlab/Banksy")
library(Banksy)
library(SeuratWrappers)
library(tidyverse)

seurat_processed <- readRDS("data/seurat_processed.RDS")
seurat_processed

An object of class Seurat 
36170 features across 67660 samples within 2 assays 
Active assay: Spatial.016um (18085 features, 2000 variable features)
 2 layers present: counts, data
 1 other assay present: sketch
 4 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch
 2 spatial fields of view present: P5CRC.016um P5NAT.016um

Running BANKSY

We use the RunBanksy function to create a new “BANKSY” assay based on a default of the 4,000 most highly variable features, which can be used for dimensionality reduction and clustering. Some parameters of importance for RunBanksy() are:

Parameter	Description
`k_geom`	Number of bins to consider for the local neighborhood. Larger values will yield larger domains.
`lambda`	Influence of the neighborhood. Larger values yield more spatially coherent domains. The authors recommend using 0.8 to identify broader spatial domains.
`group`	Allows us to run BANKSY on multiple samples, enabling the function to understand that overlapping spatial locations can come from different slides.
`dimx`, `dimy`	The x and y coordinates of a spot on the tissue slide.
`split.scale`	Within-group scaling parameter, accounting for minor differences between datasets.

To set ourselves up for success, we are going to add the spatial locations for each spot into the metadata. This will allow us to specify dimx and dimy more easily.

seurat_processed$cell <- rownames(seurat_processed@meta.data)
# CRC and NAT sample coordinates
coords_crc <- GetTissueCoordinates(seurat_processed, image = "P5CRC.016um")
coords_nat <- GetTissueCoordinates(seurat_processed, image = "P5NAT.016um")
coords <- rbind(coords_crc, coords_nat)

seurat_processed@meta.data <- left_join(seurat_processed@meta.data, coords, by = "cell")
rownames(seurat_processed@meta.data) <- seurat_processed$cell

Which we can now use to calculate the new BANKSY matrix. This step can take a while to compute (TODO time estimate)

# Run Banksy from SeuratWrappers
seurat_banksy <- RunBanksy(seurat_processed, lambda = 0.6, verbose = T,
                           assay = 'Spatial.016um', slot = 'data', k_geom = 15,
                           dimx = 'x', dimy = 'y', 
                           group = "orig.ident", split.scale = T)

With a new Active assay: BANKSY

seurat_banksy

An object of class Seurat 
40170 features across 67660 samples within 3 assays 
Active assay: BANKSY (4000 features, 0 variable features)
 2 layers present: data, scale.data
 2 other assays present: Spatial.016um, sketch
 4 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch
 2 spatial fields of view present: P5CRC.016um P5NAT.016um

New BANKSY Matrix

At this point, we have our new counts matrix that includes our neighborhood weighted scores. We can see this more clearly if we investigate the Features() that exist in our BANKSY assay. Where teh first few features are our original counts matrix with genes.

Features(seurat_banksy) %>% head()

[1] "SPP1"  "IGHG1" "IGHM"  "IGHG3" "CXCL8" "INSL5"

Where we also have “new features” that are our lambda scaled neighborhood weighted values, with an appended *.m0 to make the distinction clear.

Features(seurat_banksy) %>% tail()

[1] "PRKAR1B.m0" "PCID2.m0"   "RAD54L.m0"  "COL8A1.m0"  "PDSS1.m0"  
[6] "EIPR1.m0"

Calculations on BANKSY Matrix

With these weights, we can now run through a simplified clustering workflow - using many of the same functions we used in the previous lesson on the scRNA workflow. The only distinction here is that these values are going to be calculated on our new BANKSY count matrix.

# Increases the size of the default vector
options(future.globals.maxSize = 200000000000)

# PCA
seurat_banksy <- RunPCA(seurat_banksy, 
                        assay = "BANKSY", 
                        reduction.name = "pca.banksy", 
                        features = rownames(seurat_banksy), 
                        npcs = 30)
# Find k-Nearest Neighbors
seurat_banksy <- FindNeighbors(seurat_banksy, 
                               reduction = "pca.banksy", 
                               dims = 1:30)
# Louvain clustering
seurat_banksy <- FindClusters(seurat_banksy, 
                              cluster.name = "banksy_cluster",
                              resolution = 0.8)

At this point, we no longer need access to the BANKSY assay as all the relevant information has now been stored as:

banksy_clusters in @meta.data
pca.banksy in @reductions

So to conserve memory, we are going to delete the BANKSY assay from our seurat object.

# Change default asasy from BANKSY
DefaultAssay(seurat_banksy) <- "Spatial.016um"

# Delete BANKSY assay by setting it to NULL
seurat_banksy[["BANKSY"]] <- NULL

BANKSY Clusters

SpatialDimPlot(seurat_banksy, 
               group.by = "banksy_cluster", 
               pt.size.factor = 13, 
               image.alpha = 0)

Exercise 2

Run BANKSY with a lambda of 0.2, how do the results differ from a value of 0.8?

Comparing Clustering Methods

At this point, we have two different results for our clusters - BANKSY and RNA-based clusters. Let us take a moment to compare and contrast what the differences are between the two.

Spatial Overlay

SpatialDimPlot(seurat_banksy, 
               group.by = c("banksy_cluster",
                            "seurat_cluster.projected"), 
               pt.size.factor = 13, 
               image.alpha = 0)

UMAP

DimPlot(seurat_banksy, 
        group.by = c("banksy_cluster",
                     "seurat_cluster.projected"))

Sub-topic 3

# Barplot of proportion of cells in each cluster by sample
ggplot(seurat_banksy@meta.data) +
    geom_bar(aes(x=seurat_cluster.projected, fill=orig.ident), 
             position=position_fill())  +
    theme_classic()

ggplot(seurat_banksy@meta.data) +
    geom_bar(aes(x=banksy_cluster, fill=orig.ident), 
             position=position_fill())  +
    theme_classic()

Exercise 3

Exercise 3

Spatially Variable Genes

https://academic.oup.com/bioinformatics/article/41/4/btaf131/8096371

Spatially variable genes (SVGs) are similar in concept to the highly variable genes that we calculated earlier - but with the added spatial location information. Essentially, identifying genes whose expression patterns change depending on where a cell is located on the tissue.

In doing so, we are able to cluster cells together in a way that represents the different spatial domains

Use Case of SVGs

https://www.nature.com/articles/s41467-025-56080-w

Methods for detecting the three SVG categories serve different purposes (Fig. 2a). First, the detection of overall SVGs screens informative genes for downstream analyses, including the identification of spatial domains and functional gene modules. Second, detecting cell-type-specific SVGs aims to reveal spatial variation within a cell type and help identify distinct cell subpopulations or states within cell types. Third, spatial-domain-marker SVG detection is used to find marker genes to annotate and interpret spatial domains already detected. These markers help understand the molecular mechanisms underlying spatial domains and assist in annotating tissue layers in other datasets.

Methods

SpatialDE (python)
MERINGUE
BinSpect

Sub-topic 3

Exercise 3

Exercise 1

--- title: "Spatially Derived Clusters" authors: "Will Gammerdinger, Noor Sohail, Zhu Zhuo, James Billingsley, Shannan Ho Sui" date: "Tuesday, July 22, 2025" editor_options: markdown: wrap: 72 --- Approximate time: XY minutes # Learning Objectives - LO 1 - LO 2 - LO 3 # BANKSY [BANKSY](https://www.nature.com/articles/s41588-024-01664-3) is another method for performing clustering. Unlike Seurat, BANKSY takes into account not only an individual bin’s expression pattern but also the mean and the gradient of gene expression levels in a bin’s broader neighborhood. This makes it valuable for identifying and defining spatial regions of interest. To accomplish this, BANKSY calculates two different scores: - Weighted mean expression of genes in a spatial neighborhood - "Azimuthal gabor filter", which is the "gradient of gene expression in each cell’s neighborhood" In doing so, BANKSY is able to be flexible and not limit celltypes to _only_ be located nearby one another. Even more than that, there is `lambda` value that scales the weight of each of these matrices to give the user flexibility in how strongly the spatial weights should impact clustering. These new matrices are then concatenated to the counts matrix before running the remainder of a standard scRNA-seq workflow (dimensionality reduction, clustering, integration, etc.). Therefore, the final output of this algorithm is a modulated counts matrix that includes weighted scores that correspond to spatial domains. ```{r} library(Seurat) # remotes::install_github("prabhakarlab/Banksy") library(Banksy) library(SeuratWrappers) library(tidyverse) seurat_processed <- readRDS("data/seurat_processed.RDS") seurat_processed ``` ## Running BANKSY We use the `RunBanksy` function to create a new "BANKSY" assay based on a default of the 4,000 most highly variable features, which can be used for dimensionality reduction and clustering. Some parameters of importance for `RunBanksy()` are: | Parameter | Description | |----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------| | `k_geom` | Number of bins to consider for the local neighborhood. Larger values will yield larger domains. | | `lambda` | Influence of the neighborhood. Larger values yield more spatially coherent domains. The authors recommend using 0.8 to identify broader spatial domains. | | `group` | Allows us to run BANKSY on multiple samples, enabling the function to understand that overlapping spatial locations can come from different slides. | | `dimx`, `dimy` | The x and y coordinates of a spot on the tissue slide. | | `split.scale` | Within-group scaling parameter, accounting for minor differences between datasets. | To set ourselves up for success, we are going to add the spatial locations for each spot into the metadata. This will allow us to specify `dimx` and `dimy` more easily. ```{r} seurat_processed$cell <- rownames(seurat_processed@meta.data) # CRC and NAT sample coordinates coords_crc <- GetTissueCoordinates(seurat_processed, image = "P5CRC.016um") coords_nat <- GetTissueCoordinates(seurat_processed, image = "P5NAT.016um") coords <- rbind(coords_crc, coords_nat) seurat_processed@meta.data <- left_join(seurat_processed@meta.data, coords, by = "cell") rownames(seurat_processed@meta.data) <- seurat_processed$cell ``` Which we can now use to calculate the new BANKSY matrix. This step can take a while to compute (TODO time estimate) ```{r} #| eval: false # Run Banksy from SeuratWrappers seurat_banksy <- RunBanksy(seurat_processed, lambda = 0.6, verbose = T, assay = 'Spatial.016um', slot = 'data', k_geom = 15, dimx = 'x', dimy = 'y', group = "orig.ident", split.scale = T) ``` ```{r} #| echo: false # saveRDS(seurat_banksy, "data/seurat_banksy_1.RDS") seurat_banksy <- readRDS("data/seurat_banksy_1.RDS") ``` With a new `Active assay: BANKSY` ```{r} seurat_banksy ``` ## New BANKSY Matrix At this point, we have our new counts matrix that includes our neighborhood weighted scores. We can see this more clearly if we investigate the `Features()` that exist in our BANKSY assay. Where teh first few features are our original counts matrix with genes. ```{r} Features(seurat_banksy) %>% head() ``` Where we also have "new features" that are our `lambda` scaled neighborhood weighted values, with an appended `*.m0` to make the distinction clear. ```{r} Features(seurat_banksy) %>% tail() ``` ## Calculations on BANKSY Matrix With these weights, we can now run through a simplified clustering workflow - using many of the same functions we used in the previous lesson on the scRNA workflow. The only distinction here is that these values are going to be calculated on our new BANKSY count matrix. ```{r} #| eval: false # Increases the size of the default vector options(future.globals.maxSize = 200000000000) # PCA seurat_banksy <- RunPCA(seurat_banksy, assay = "BANKSY", reduction.name = "pca.banksy", features = rownames(seurat_banksy), npcs = 30) # Find k-Nearest Neighbors seurat_banksy <- FindNeighbors(seurat_banksy, reduction = "pca.banksy", dims = 1:30) # Louvain clustering seurat_banksy <- FindClusters(seurat_banksy, cluster.name = "banksy_cluster", resolution = 0.8) ``` At this point, we no longer need access to the BANKSY assay as all the relevant information has now been stored as: - `banksy_clusters` in `@meta.data` - `pca.banksy` in `@reductions` So to conserve memory, we are going to delete the BANKSY assay from our seurat object. ```{r} #| eval: false # Change default asasy from BANKSY DefaultAssay(seurat_banksy) <- "Spatial.016um" # Delete BANKSY assay by setting it to NULL seurat_banksy[["BANKSY"]] <- NULL ``` ```{r} #| echo: false # saveRDS(seurat_banksy, "data/seurat_banksy_2.RDS") seurat_banksy <- readRDS("data/seurat_banksy_2.RDS") ``` ## BANKSY Clusters ```{r} #| fig-width: 10 SpatialDimPlot(seurat_banksy, group.by = "banksy_cluster", pt.size.factor = 13, image.alpha = 0) ``` ::: callout-tip # [**Exercise 2**](08_spatially_derived_clusters_Answer-key.qmd#exercise-2) Run BANKSY with a `lambda` of 0.2, how do the results differ from a value of 0.8? ::: # Comparing Clustering Methods At this point, we have two different results for our clusters - BANKSY and RNA-based clusters. Let us take a moment to compare and contrast what the differences are between the two. ## Spatial Overlay ```{r} #| fig-width: 10 #| fig-height: 10 SpatialDimPlot(seurat_banksy, group.by = c("banksy_cluster", "seurat_cluster.projected"), pt.size.factor = 13, image.alpha = 0) ``` ## UMAP ```{r} #| fig-width: 10 DimPlot(seurat_banksy, group.by = c("banksy_cluster", "seurat_cluster.projected")) ``` ## Sub-topic 3 ```{r} # Barplot of proportion of cells in each cluster by sample ggplot(seurat_banksy@meta.data) + geom_bar(aes(x=seurat_cluster.projected, fill=orig.ident), position=position_fill()) + theme_classic() ``` ```{r} ggplot(seurat_banksy@meta.data) + geom_bar(aes(x=banksy_cluster, fill=orig.ident), position=position_fill()) + theme_classic() ``` ::: callout-tip # [**Exercise 3**](08_spatially_derived_clusters_Answer-key.qmd#exercise-3) Exercise 3 ::: # Spatially Variable Genes https://academic.oup.com/bioinformatics/article/41/4/btaf131/8096371 Spatially variable genes (SVGs) are similar in concept to the highly variable genes that we calculated earlier - but with the added spatial location information. Essentially, identifying genes whose expression patterns change depending on where a cell is located on the tissue. In doing so, we are able to cluster cells together in a way that represents the different spatial domains ## Use Case of SVGs https://www.nature.com/articles/s41467-025-56080-w > Methods for detecting the three SVG categories serve different purposes (Fig. 2a). First, the detection of overall SVGs screens informative genes for downstream analyses, including the identification of spatial domains and functional gene modules. Second, detecting cell-type-specific SVGs aims to reveal spatial variation within a cell type and help identify distinct cell subpopulations or states within cell types. Third, spatial-domain-marker SVG detection is used to find marker genes to annotate and interpret spatial domains already detected. These markers help understand the molecular mechanisms underlying spatial domains and assist in annotating tissue layers in other datasets. ## Methods - SpatialDE (python) - MERINGUE - BinSpect ## Sub-topic 3 ::: callout-tip # [**Exercise 3**](08_spatially_derived_clusters_Answer-key.qmd#exercise-3) Exercise 1 :::