Spatially Derived Clusters

category_1
category_2
category_3
category_4

Write a description of the lesson here.

Author

Noor Sohail

Published

July 22, 2025

Keywords

keyword_1, keyword_2, keyword_3, keyword_4, keyword_5, keyword_6

Approximate time: XX minutes

Learning objectives

In this lesson, we will:

  • Learning Objective 1
  • Learning Objective 2
  • Learning Objective 3

Overview of lesson

When doing XYZ…

BANKSY

BANKSY is another method for performing clustering. Unlike Seurat, BANKSY takes into account not only an individual bin’s expression pattern but also the mean and the gradient of gene expression levels in a bin’s broader neighborhood. This makes it valuable for identifying and defining spatial regions of interest.

To accomplish this, BANKSY calculates two different scores:

  • Weighted mean expression of genes in a spatial neighborhood
  • “Azimuthal gabor filter”, which is the “gradient of gene expression in each cell’s neighborhood”

In doing so, BANKSY is able to be flexible and not limit celltypes to only be located nearby one another. Even more than that, there is lambda value that scales the weight of each of these matrices to give the user flexibility in how strongly the spatial weights should impact clustering.

These new matrices are then concatenated to the counts matrix before running the remainder of a standard scRNA-seq workflow (dimensionality reduction, clustering, integration, etc.). Therefore, the final output of this algorithm is a modulated counts matrix that includes weighted scores that correspond to spatial domains.

library(Seurat)
library(Banksy)
library(SeuratWrappers)
library(tidyverse)
library(future)
library(harmony)
set.seed(12345)

# Set parallelization (multithreading)
# Speed up calculations
plan("multicore", workers = parallel::detectCores() - 1)

# Increases the size of the default vector
# 8GB
options(future.globals.maxSize = 8000 * 1024^2)

# Load dataset
seurat_processed <- readRDS("data/seurat_processed.RDS")

Running BANKSY

We use the RunBanksy function to create a new “BANKSY” assay based on a default of the 4,000 most highly variable features, which can be used for dimensionality reduction and clustering. Some parameters of importance for RunBanksy() are:

Parameter Description
k_geom Number of bins to consider for the local neighborhood. Larger values will yield larger domains.
lambda Influence of the neighborhood. Larger values yield more spatially coherent domains. The authors recommend using 0.8 to identify broader spatial domains.
group Allows us to run BANKSY on multiple samples, enabling the function to understand that overlapping spatial locations can come from different slides.
dimx, dimy The x and y coordinates of a spot on the tissue slide.
split.scale Within-group scaling parameter, accounting for minor differences between datasets.

To set ourselves up for success, we are going to add the spatial locations for each spot into the metadata. This will allow us to specify dimx and dimy more easily.

# Store the cell barcode as a column
# Sometimes when you merge dataframes, you lose rownames
seurat_processed$cell <- rownames(seurat_processed@meta.data)

# CRC and NAT sample coordinates
coords_crc <- GetTissueCoordinates(seurat_processed, 
                                   image = "P5CRC.008um")
coords_nat <- GetTissueCoordinates(seurat_processed, 
                                   image = "P5NAT.008um")
coords <- rbind(coords_crc, coords_nat)

You may be asking, why did we have to run GetTissueCoordinates() twice? This is because this function will only return the coordinates for one image at a time.

coords %>% head()
                                   x        y                        cell
P5CRC_s_008um_00078_00444-1 57263.31 61738.62 P5CRC_s_008um_00078_00444-1
P5CRC_s_008um_00128_00278-1 52418.69 60257.61 P5CRC_s_008um_00128_00278-1
P5CRC_s_008um_00052_00559-1 60620.61 62512.22 P5CRC_s_008um_00052_00559-1
P5CRC_s_008um_00121_00413-1 56362.64 60478.41 P5CRC_s_008um_00121_00413-1
P5CRC_s_008um_00202_00633-1 62800.98 58138.03 P5CRC_s_008um_00202_00633-1
P5CRC_s_008um_00035_00460-1 57725.67 62997.04 P5CRC_s_008um_00035_00460-1

Next we are going to add these coordinate sinto our metadata, this is so that we can directly call this column from the RunBanksy() function.

seurat_processed@meta.data <- left_join(seurat_processed@meta.data, coords, by = "cell")
seurat_processed@meta.data %>% head()
  orig.ident nCount_Spatial.008um nFeature_Spatial.008um log10GenesPerUMI
1      P5CRC                   65                     57        0.9685377
2      P5CRC                 1300                    906        0.9496410
3      P5CRC                  128                    121        0.9884090
4      P5CRC                  538                    326        0.9203288
5      P5CRC                  365                    232        0.9231919
6      P5CRC                  122                    100        0.9586074
   mitoRatio leverage.score seurat_cluster.sketched seurat_cluster.projected
1 0.23076923     0.11825052                    <NA>                        3
2 0.17307692     0.81262254                    <NA>                        1
3 0.02343750     1.39678933                    <NA>                       11
4 0.00929368     0.61539562                    <NA>                        9
5 0.06575342     0.44573347                    <NA>                        2
6 0.16393443     0.09451219                    <NA>                        3
  seurat_cluster.projected.score                        cell        x        y
1                      0.9783292 P5CRC_s_008um_00078_00444-1 57263.31 61738.62
2                      1.0000000 P5CRC_s_008um_00128_00278-1 52418.69 60257.61
3                      0.6450775 P5CRC_s_008um_00052_00559-1 60620.61 62512.22
4                      0.6799116 P5CRC_s_008um_00121_00413-1 56362.64 60478.41
5                      1.0000000 P5CRC_s_008um_00202_00633-1 62800.98 58138.03
6                      1.0000000 P5CRC_s_008um_00035_00460-1 57725.67 62997.04

But wait! Our rownames are missing! It is vital that the rownames of our Seurat object align with what is stored in Cell(). So we are going to add the rownames back into our metadata.

Afterwards, we can then calculate the new BANKSY matrix. This step can take a while to compute

Note

TODO time estimate ~1 minute for me

# Add cell barcodes back to rownames
rownames(seurat_processed@meta.data) <- seurat_processed$cell

# Run Banksy from SeuratWrappers
seurat_banksy <- RunBanksy(seurat_processed, lambda = 0.8, verbose = T,
                           assay = 'Spatial.008um', slot = 'data', k_geom = 15,
                           dimx = 'x', dimy = 'y', 
                           group = "orig.ident", split.scale = T)
Warning

Totally possible that participants run out RAM at this step. We should provide them the intermediate object at this point.

seurat_banksy is 10.4 GB on my system right now…

Basically if participants don’t have enough RAM, they will need to follow along and then load the banksy object at the end of the lesson for next part

With a new Active assay: BANKSY

seurat_banksy
An object of class Seurat 
40170 features across 135798 samples within 3 assays 
Active assay: BANKSY (4000 features, 0 variable features)
 2 layers present: data, scale.data
 2 other assays present: Spatial.008um, sketch
 4 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch
 2 spatial fields of view present: P5CRC.008um P5NAT.008um

New BANKSY Matrix

At this point, we have our new counts matrix that includes our neighborhood weighted scores. We can see this more clearly if we investigate the Features() that exist in our BANKSY assay. Where teh first few features are our original counts matrix with genes.

Features(seurat_banksy) %>% head()
[1] "IGHG1" "IGHM"  "SST"   "IGLC1" "INSL5" "CHGA" 

Where we also have “new features” that are our lambda scaled neighborhood weighted values, with an appended *.m0 to make the distinction clear.

Features(seurat_banksy) %>% tail()
[1] "ADAM23.m0" "PNMT.m0"   "RSPO2.m0"  "HTRA3.m0"  "SLC3A1.m0" "TLR8.m0"  

Calculations on BANKSY matrix

With these weights, we can now run through a simplified clustering workflow - using many of the same functions we used in the previous lesson on the scRNA workflow. The only distinction here is that these values are going to be calculated on our new BANKSY count matrix.

  1. What steps/functions did we use to calculate clusters previously?

Dimensionality reduction

With this new matrix, we are once again going to calculate our PCA space, this time storing it as pca.banksy to keep it straight from the previous PCA coordinates that were calculated.

# PCA
seurat_banksy <- RunPCA(seurat_banksy, 
                        assay = "BANKSY", 
                        reduction.name = "pca.banksy", 
                        features = rownames(seurat_banksy), 
                        npcs = 30)

We can take a look at the distribution of the samples and the gene IGKC (B cell marker gene) to see if this newly calculated PCA space requires integration.

# BANKSY PCA colored by sample
p_orig <- DimPlot(seurat_banksy,
                  group.by = "orig.ident",
                  reduction = "pca.banksy") 

# BANKSY PCA colored by IGKC expression
p_igkc <- FeaturePlot(seurat_banksy,
                        feature = "IGKC",
                        reduction = "pca.banksy")

p_orig + p_igkc

We can see that there appears to be 2 unique populations of B cells in our PCA space, where the orig.ident or sample is driving the differences in

Integration

We are going to use the Harmony algorithm in order to batch correct our dataset. This is going to create a new reduction we will call harmony.banksy.

This may take several iterations and may take a few minutes to run.

Warning
TODO how long does this step take
1 minute for me
# Harmony batch correction
seurat_banksy <- RunHarmony(seurat_banksy,
                            group.by.vars = "orig.ident",
                            reduction = "pca.banksy",
                            reduction.save = "harmony.banksy",
                            assay.use = "BANKSY")

We can once again visualize what this new, corrected dimensionality reduction looks like:

# Harmony BANKSY PCA colored by sample
p_orig <- DimPlot(seurat_banksy,
                  group.by = "orig.ident",
                  reduction = "harmony.banksy") 

# Harmony BANKSY PCA colored by IGKC expression
p_igkc <- FeaturePlot(seurat_banksy,
                        feature = "IGKC",
                        reduction = "harmony.banksy")

p_orig + p_igkc

Clustering

We are once again going to use a resolution = 0.65 so that we can make comparisons of our spatially-weighted clusters (BANKSY) versus transcript based clusters ().

# Find k-Nearest Neighbors
seurat_banksy <- FindNeighbors(seurat_banksy, 
                               reduction = "pca.banksy", 
                               dims = 1:30)

# Leiden clustering
seurat_banksy <- FindClusters(seurat_banksy, 
                              algorithm = 4,
                              cluster.name = "banksy_cluster",
                              resolution = 0.65,
                              random.seed = 12345)

At this point, we no longer need access to the BANKSY assay as all the relevant information has now been stored as:

  • banksy_clusters in @meta.data
  • pca.banksy in @reductions

So to conserve memory, we are going to delete the BANKSY assay from our seurat object.

# Change default asasy from BANKSY
DefaultAssay(seurat_banksy) <- "Spatial.008um"

# Delete BANKSY assay by setting it to NULL
seurat_banksy[["BANKSY"]] <- NULL
  1. Run BANKSY with a lambda of 0.2, how do the results differ from a value of 0.8?

Comparing clustering methods

At this point, we have two different results for our clusters - BANKSY and RNA-based clusters. Let us take a moment to compare and contrast what the differences are between the two.

Spatial Overlay

SpatialDimPlot(seurat_banksy, 
               group.by = c("banksy_cluster",
                            "seurat_cluster.projected"), 
               pt.size.factor = 15, 
               image.alpha = 0)

UMAP

DimPlot(seurat_banksy, 
        group.by = c("banksy_cluster",
                     "seurat_cluster.projected"),
        reduction = "full.umap.sketch")

  1. What does the UMAP look like when use the BANKSY derived PCA to calculate coordinates?

Save!

saveRDS("data/seurat_banksy.RDS")

Next Lesson >>

Back to Schedule

Reuse

CC-BY-4.0