Will Gammerdinger, Noor Sohail, Zhu Zhuo, James Billingsley, Shannan Ho Sui
Published
July 22, 2025
Approximate time: XY minutes
Learning Objectives
LO 1
LO 2
LO 3
BANKSY
BANKSY is another method for performing clustering. Unlike Seurat, BANKSY takes into account not only an individual bin’s expression pattern but also the mean and the gradient of gene expression levels in a bin’s broader neighborhood. This makes it valuable for identifying and defining spatial regions of interest.
To accomplish this, BANKSY calculates two different scores:
Weighted mean expression of genes in a spatial neighborhood
“Azimuthal gabor filter”, which is the “gradient of gene expression in each cell’s neighborhood”
In doing so, BANKSY is able to be flexible and not limit celltypes to only be located nearby one another. Even more than that, there is lambda value that scales the weight of each of these matrices to give the user flexibility in how strongly the spatial weights should impact clustering.
These new matrices are then concatenated to the counts matrix before running the remainder of a standard scRNA-seq workflow (dimensionality reduction, clustering, integration, etc.). Therefore, the final output of this algorithm is a modulated counts matrix that includes weighted scores that correspond to spatial domains.
An object of class Seurat
36170 features across 67660 samples within 2 assays
Active assay: Spatial.016um (18085 features, 2000 variable features)
2 layers present: counts, data
1 other assay present: sketch
4 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch
2 spatial fields of view present: P5CRC.016um P5NAT.016um
Running BANKSY
We use the RunBanksy function to create a new “BANKSY” assay based on a default of the 4,000 most highly variable features, which can be used for dimensionality reduction and clustering. Some parameters of importance for RunBanksy() are:
Parameter
Description
k_geom
Number of bins to consider for the local neighborhood. Larger values will yield larger domains.
lambda
Influence of the neighborhood. Larger values yield more spatially coherent domains. The authors recommend using 0.8 to identify broader spatial domains.
group
Allows us to run BANKSY on multiple samples, enabling the function to understand that overlapping spatial locations can come from different slides.
dimx, dimy
The x and y coordinates of a spot on the tissue slide.
split.scale
Within-group scaling parameter, accounting for minor differences between datasets.
To set ourselves up for success, we are going to add the spatial locations for each spot into the metadata. This will allow us to specify dimx and dimy more easily.
Which we can now use to calculate the new BANKSY matrix. This step can take a while to compute (TODO time estimate)
# Run Banksy from SeuratWrappersseurat_banksy <-RunBanksy(seurat_processed, lambda =0.6, verbose = T,assay ='Spatial.016um', slot ='data', k_geom =15,dimx ='x', dimy ='y', group ="orig.ident", split.scale = T)
With a new Active assay: BANKSY
seurat_banksy
An object of class Seurat
40170 features across 67660 samples within 3 assays
Active assay: BANKSY (4000 features, 0 variable features)
2 layers present: data, scale.data
2 other assays present: Spatial.016um, sketch
4 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch
2 spatial fields of view present: P5CRC.016um P5NAT.016um
New BANKSY Matrix
At this point, we have our new counts matrix that includes our neighborhood weighted scores. We can see this more clearly if we investigate the Features() that exist in our BANKSY assay. Where teh first few features are our original counts matrix with genes.
Features(seurat_banksy) %>%head()
[1] "SPP1" "IGHG1" "IGHM" "IGHG3" "CXCL8" "INSL5"
Where we also have “new features” that are our lambda scaled neighborhood weighted values, with an appended *.m0 to make the distinction clear.
With these weights, we can now run through a simplified clustering workflow - using many of the same functions we used in the previous lesson on the scRNA workflow. The only distinction here is that these values are going to be calculated on our new BANKSY count matrix.
# Increases the size of the default vectoroptions(future.globals.maxSize =200000000000)# PCAseurat_banksy <-RunPCA(seurat_banksy, assay ="BANKSY", reduction.name ="pca.banksy", features =rownames(seurat_banksy), npcs =30)# Find k-Nearest Neighborsseurat_banksy <-FindNeighbors(seurat_banksy, reduction ="pca.banksy", dims =1:30)# Louvain clusteringseurat_banksy <-FindClusters(seurat_banksy, cluster.name ="banksy_cluster",resolution =0.8)
At this point, we no longer need access to the BANKSY assay as all the relevant information has now been stored as:
banksy_clusters in @meta.data
pca.banksy in @reductions
So to conserve memory, we are going to delete the BANKSY assay from our seurat object.
# Change default asasy from BANKSYDefaultAssay(seurat_banksy) <-"Spatial.016um"# Delete BANKSY assay by setting it to NULLseurat_banksy[["BANKSY"]] <-NULL
Run BANKSY with a lambda of 0.2, how do the results differ from a value of 0.8?
Comparing Clustering Methods
At this point, we have two different results for our clusters - BANKSY and RNA-based clusters. Let us take a moment to compare and contrast what the differences are between the two.
# Barplot of proportion of cells in each cluster by sampleggplot(seurat_banksy@meta.data) +geom_bar(aes(x=seurat_cluster.projected, fill=orig.ident), position=position_fill()) +theme_classic()
Spatially variable genes (SVGs) are similar in concept to the highly variable genes that we calculated earlier - but with the added spatial location information. Essentially, identifying genes whose expression patterns change depending on where a cell is located on the tissue.
In doing so, we are able to cluster cells together in a way that represents the different spatial domains
Methods for detecting the three SVG categories serve different purposes (Fig. 2a). First, the detection of overall SVGs screens informative genes for downstream analyses, including the identification of spatial domains and functional gene modules. Second, detecting cell-type-specific SVGs aims to reveal spatial variation within a cell type and help identify distinct cell subpopulations or states within cell types. Third, spatial-domain-marker SVG detection is used to find marker genes to annotate and interpret spatial domains already detected. These markers help understand the molecular mechanisms underlying spatial domains and assist in annotating tissue layers in other datasets.
---title: "Spatially Derived Clusters"authors: "Will Gammerdinger, Noor Sohail, Zhu Zhuo, James Billingsley, Shannan Ho Sui"date: "Tuesday, July 22, 2025"editor_options: markdown: wrap: 72---Approximate time: XY minutes# Learning Objectives- LO 1- LO 2- LO 3# BANKSY[BANKSY](https://www.nature.com/articles/s41588-024-01664-3) is another method for performing clustering. Unlike Seurat, BANKSY takes into account not only an individual bin’s expression pattern but also the mean and the gradient of gene expression levels in a bin’s broader neighborhood. This makes it valuable for identifying and defining spatial regions of interest.To accomplish this, BANKSY calculates two different scores:- Weighted mean expression of genes in a spatial neighborhood- "Azimuthal gabor filter", which is the "gradient of gene expression in each cell’s neighborhood"In doing so, BANKSY is able to be flexible and not limit celltypes to _only_ be located nearby one another. Even more than that, there is `lambda` value that scales the weight of each of these matrices to give the user flexibility in how strongly the spatial weights should impact clustering.These new matrices are then concatenated to the counts matrix before running the remainder of a standard scRNA-seq workflow (dimensionality reduction, clustering, integration, etc.). Therefore, the final output of this algorithm is a modulated counts matrix that includes weighted scores that correspond to spatial domains.```{r}library(Seurat)# remotes::install_github("prabhakarlab/Banksy")library(Banksy)library(SeuratWrappers)library(tidyverse)seurat_processed <-readRDS("data/seurat_processed.RDS")seurat_processed```## Running BANKSYWe use the `RunBanksy` function to create a new "BANKSY" assay based on a default of the 4,000 most highly variable features, which can be used for dimensionality reduction and clustering. Some parameters of importance for `RunBanksy()` are:| Parameter | Description ||----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|| `k_geom` | Number of bins to consider for the local neighborhood. Larger values will yield larger domains. || `lambda` | Influence of the neighborhood. Larger values yield more spatially coherent domains. The authors recommend using 0.8 to identify broader spatial domains. || `group` | Allows us to run BANKSY on multiple samples, enabling the function to understand that overlapping spatial locations can come from different slides. || `dimx`, `dimy` | The x and y coordinates of a spot on the tissue slide. || `split.scale` | Within-group scaling parameter, accounting for minor differences between datasets. |To set ourselves up for success, we are going to add the spatial locations for each spot into the metadata. This will allow us to specify `dimx` and `dimy` more easily.```{r}seurat_processed$cell <-rownames(seurat_processed@meta.data)# CRC and NAT sample coordinatescoords_crc <-GetTissueCoordinates(seurat_processed, image ="P5CRC.016um")coords_nat <-GetTissueCoordinates(seurat_processed, image ="P5NAT.016um")coords <-rbind(coords_crc, coords_nat)seurat_processed@meta.data <-left_join(seurat_processed@meta.data, coords, by ="cell")rownames(seurat_processed@meta.data) <- seurat_processed$cell```Which we can now use to calculate the new BANKSY matrix. This step can take a while to compute (TODO time estimate)```{r}#| eval: false# Run Banksy from SeuratWrappersseurat_banksy <-RunBanksy(seurat_processed, lambda =0.6, verbose = T,assay ='Spatial.016um', slot ='data', k_geom =15,dimx ='x', dimy ='y', group ="orig.ident", split.scale = T)``````{r}#| echo: false# saveRDS(seurat_banksy, "data/seurat_banksy_1.RDS")seurat_banksy <-readRDS("data/seurat_banksy_1.RDS")```With a new `Active assay: BANKSY````{r}seurat_banksy```## New BANKSY MatrixAt this point, we have our new counts matrix that includes our neighborhood weighted scores. We can see this more clearly if we investigate the `Features()` that exist in our BANKSY assay. Where teh first few features are our original counts matrix with genes.```{r}Features(seurat_banksy) %>%head()```Where we also have "new features" that are our `lambda` scaled neighborhood weighted values, with an appended `*.m0` to make the distinction clear.```{r}Features(seurat_banksy) %>%tail()```## Calculations on BANKSY MatrixWith these weights, we can now run through a simplified clustering workflow - using many of the same functions we used in the previous lesson on the scRNA workflow. The only distinction here is that these values are going to be calculated on our new BANKSY count matrix.```{r}#| eval: false# Increases the size of the default vectoroptions(future.globals.maxSize =200000000000)# PCAseurat_banksy <-RunPCA(seurat_banksy, assay ="BANKSY", reduction.name ="pca.banksy", features =rownames(seurat_banksy), npcs =30)# Find k-Nearest Neighborsseurat_banksy <-FindNeighbors(seurat_banksy, reduction ="pca.banksy", dims =1:30)# Louvain clusteringseurat_banksy <-FindClusters(seurat_banksy, cluster.name ="banksy_cluster",resolution =0.8)```At this point, we no longer need access to the BANKSY assay as all the relevant information has now been stored as:- `banksy_clusters` in `@meta.data`- `pca.banksy` in `@reductions`So to conserve memory, we are going to delete the BANKSY assay from our seurat object.```{r}#| eval: false# Change default asasy from BANKSYDefaultAssay(seurat_banksy) <-"Spatial.016um"# Delete BANKSY assay by setting it to NULLseurat_banksy[["BANKSY"]] <-NULL``````{r}#| echo: false# saveRDS(seurat_banksy, "data/seurat_banksy_2.RDS")seurat_banksy <-readRDS("data/seurat_banksy_2.RDS")```## BANKSY Clusters```{r}#| fig-width: 10SpatialDimPlot(seurat_banksy, group.by ="banksy_cluster", pt.size.factor =13, image.alpha =0)```::: callout-tip# [**Exercise 2**](08_spatially_derived_clusters_Answer-key.qmd#exercise-2)Run BANKSY with a `lambda` of 0.2, how do the results differ from a value of 0.8?:::# Comparing Clustering MethodsAt this point, we have two different results for our clusters - BANKSY and RNA-based clusters. Let us take a moment to compare and contrast what the differences are between the two.## Spatial Overlay```{r}#| fig-width: 10#| fig-height: 10SpatialDimPlot(seurat_banksy, group.by =c("banksy_cluster","seurat_cluster.projected"), pt.size.factor =13, image.alpha =0)```## UMAP```{r}#| fig-width: 10DimPlot(seurat_banksy, group.by =c("banksy_cluster","seurat_cluster.projected"))```## Sub-topic 3```{r}# Barplot of proportion of cells in each cluster by sampleggplot(seurat_banksy@meta.data) +geom_bar(aes(x=seurat_cluster.projected, fill=orig.ident), position=position_fill()) +theme_classic()``````{r}ggplot(seurat_banksy@meta.data) +geom_bar(aes(x=banksy_cluster, fill=orig.ident), position=position_fill()) +theme_classic()```::: callout-tip# [**Exercise 3**](08_spatially_derived_clusters_Answer-key.qmd#exercise-3)Exercise 3:::# Spatially Variable Geneshttps://academic.oup.com/bioinformatics/article/41/4/btaf131/8096371Spatially variable genes (SVGs) are similar in concept to the highly variable genes that we calculated earlier - but with the added spatial location information. Essentially, identifying genes whose expression patterns change depending on where a cell is located on the tissue.In doing so, we are able to cluster cells together in a way that represents the different spatial domains## Use Case of SVGshttps://www.nature.com/articles/s41467-025-56080-w> Methods for detecting the three SVG categories serve different purposes (Fig. 2a). First, the detection of overall SVGs screens informative genes for downstream analyses, including the identification of spatial domains and functional gene modules. Second, detecting cell-type-specific SVGs aims to reveal spatial variation within a cell type and help identify distinct cell subpopulations or states within cell types. Third, spatial-domain-marker SVG detection is used to find marker genes to annotate and interpret spatial domains already detected. These markers help understand the molecular mechanisms underlying spatial domains and assist in annotating tissue layers in other datasets.## Methods- SpatialDE (python)- MERINGUE- BinSpect## Sub-topic 3::: callout-tip# [**Exercise 3**](08_spatially_derived_clusters_Answer-key.qmd#exercise-3)Exercise 1:::