library(Seurat)
library(Banksy)
library(SeuratWrappers)
library(tidyverse)
library(future)
library(harmony)
set.seed(12345)
# Set parallelization (multithreading)
# Speed up calculations
plan("multicore", workers = parallel::detectCores() - 1)
# Increases the size of the default vector
# 8GB
options(future.globals.maxSize = 8000 * 1024^2)
# Load dataset
seurat_processed <- readRDS("data/seurat_processed.RDS")Spatially Derived Clusters
Write a description of the lesson here.
keyword_1, keyword_2, keyword_3, keyword_4, keyword_5, keyword_6
Approximate time: XX minutes
Learning objectives
In this lesson, we will:
- Learning Objective 1
- Learning Objective 2
- Learning Objective 3
Overview of lesson
When doing XYZ…
BANKSY
BANKSY is another method for performing clustering. Unlike Seurat, BANKSY takes into account not only an individual bin’s expression pattern but also the mean and the gradient of gene expression levels in a bin’s broader neighborhood. This makes it valuable for identifying and defining spatial regions of interest.
To accomplish this, BANKSY calculates two different scores:
- Weighted mean expression of genes in a spatial neighborhood
- “Azimuthal gabor filter”, which is the “gradient of gene expression in each cell’s neighborhood”
In doing so, BANKSY is able to be flexible and not limit celltypes to only be located nearby one another. Even more than that, there is lambda value that scales the weight of each of these matrices to give the user flexibility in how strongly the spatial weights should impact clustering.
These new matrices are then concatenated to the counts matrix before running the remainder of a standard scRNA-seq workflow (dimensionality reduction, clustering, integration, etc.). Therefore, the final output of this algorithm is a modulated counts matrix that includes weighted scores that correspond to spatial domains.
Running BANKSY
We use the RunBanksy function to create a new “BANKSY” assay based on a default of the 4,000 most highly variable features, which can be used for dimensionality reduction and clustering. Some parameters of importance for RunBanksy() are:
| Parameter | Description |
|---|---|
k_geom |
Number of bins to consider for the local neighborhood. Larger values will yield larger domains. |
lambda |
Influence of the neighborhood. Larger values yield more spatially coherent domains. The authors recommend using 0.8 to identify broader spatial domains. |
group |
Allows us to run BANKSY on multiple samples, enabling the function to understand that overlapping spatial locations can come from different slides. |
dimx, dimy |
The x and y coordinates of a spot on the tissue slide. |
split.scale |
Within-group scaling parameter, accounting for minor differences between datasets. |
To set ourselves up for success, we are going to add the spatial locations for each spot into the metadata. This will allow us to specify dimx and dimy more easily.
# Store the cell barcode as a column
# Sometimes when you merge dataframes, you lose rownames
seurat_processed$cell <- rownames(seurat_processed@meta.data)
# CRC and NAT sample coordinates
coords_crc <- GetTissueCoordinates(seurat_processed,
image = "P5CRC.008um")
coords_nat <- GetTissueCoordinates(seurat_processed,
image = "P5NAT.008um")
coords <- rbind(coords_crc, coords_nat)You may be asking, why did we have to run GetTissueCoordinates() twice? This is because this function will only return the coordinates for one image at a time.
coords %>% head() x y cell
P5CRC_s_008um_00078_00444-1 57263.31 61738.62 P5CRC_s_008um_00078_00444-1
P5CRC_s_008um_00128_00278-1 52418.69 60257.61 P5CRC_s_008um_00128_00278-1
P5CRC_s_008um_00052_00559-1 60620.61 62512.22 P5CRC_s_008um_00052_00559-1
P5CRC_s_008um_00121_00413-1 56362.64 60478.41 P5CRC_s_008um_00121_00413-1
P5CRC_s_008um_00202_00633-1 62800.98 58138.03 P5CRC_s_008um_00202_00633-1
P5CRC_s_008um_00035_00460-1 57725.67 62997.04 P5CRC_s_008um_00035_00460-1
Next we are going to add these coordinate sinto our metadata, this is so that we can directly call this column from the RunBanksy() function.
seurat_processed@meta.data <- left_join(seurat_processed@meta.data, coords, by = "cell")
seurat_processed@meta.data %>% head() orig.ident nCount_Spatial.008um nFeature_Spatial.008um log10GenesPerUMI
1 P5CRC 65 57 0.9685377
2 P5CRC 1300 906 0.9496410
3 P5CRC 128 121 0.9884090
4 P5CRC 538 326 0.9203288
5 P5CRC 365 232 0.9231919
6 P5CRC 122 100 0.9586074
mitoRatio leverage.score seurat_cluster.sketched seurat_cluster.projected
1 0.23076923 0.11825052 <NA> 3
2 0.17307692 0.81262254 <NA> 1
3 0.02343750 1.39678933 <NA> 11
4 0.00929368 0.61539562 <NA> 9
5 0.06575342 0.44573347 <NA> 2
6 0.16393443 0.09451219 <NA> 3
seurat_cluster.projected.score cell x y
1 0.9783292 P5CRC_s_008um_00078_00444-1 57263.31 61738.62
2 1.0000000 P5CRC_s_008um_00128_00278-1 52418.69 60257.61
3 0.6450775 P5CRC_s_008um_00052_00559-1 60620.61 62512.22
4 0.6799116 P5CRC_s_008um_00121_00413-1 56362.64 60478.41
5 1.0000000 P5CRC_s_008um_00202_00633-1 62800.98 58138.03
6 1.0000000 P5CRC_s_008um_00035_00460-1 57725.67 62997.04
But wait! Our rownames are missing! It is vital that the rownames of our Seurat object align with what is stored in Cell(). So we are going to add the rownames back into our metadata.
Afterwards, we can then calculate the new BANKSY matrix. This step can take a while to compute
TODO time estimate ~1 minute for me
# Add cell barcodes back to rownames
rownames(seurat_processed@meta.data) <- seurat_processed$cell
# Run Banksy from SeuratWrappers
seurat_banksy <- RunBanksy(seurat_processed, lambda = 0.8, verbose = T,
assay = 'Spatial.008um', slot = 'data', k_geom = 15,
dimx = 'x', dimy = 'y',
group = "orig.ident", split.scale = T)Totally possible that participants run out RAM at this step. We should provide them the intermediate object at this point.
seurat_banksy is 10.4 GB on my system right now…
Basically if participants don’t have enough RAM, they will need to follow along and then load the banksy object at the end of the lesson for next part
With a new Active assay: BANKSY
seurat_banksyAn object of class Seurat
40170 features across 135798 samples within 3 assays
Active assay: BANKSY (4000 features, 0 variable features)
2 layers present: data, scale.data
2 other assays present: Spatial.008um, sketch
4 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch
2 spatial fields of view present: P5CRC.008um P5NAT.008um
New BANKSY Matrix
At this point, we have our new counts matrix that includes our neighborhood weighted scores. We can see this more clearly if we investigate the Features() that exist in our BANKSY assay. Where teh first few features are our original counts matrix with genes.
Features(seurat_banksy) %>% head()[1] "IGHG1" "IGHM" "SST" "IGLC1" "INSL5" "CHGA"
Where we also have “new features” that are our lambda scaled neighborhood weighted values, with an appended *.m0 to make the distinction clear.
Features(seurat_banksy) %>% tail()[1] "ADAM23.m0" "PNMT.m0" "RSPO2.m0" "HTRA3.m0" "SLC3A1.m0" "TLR8.m0"
Calculations on BANKSY matrix
With these weights, we can now run through a simplified clustering workflow - using many of the same functions we used in the previous lesson on the scRNA workflow. The only distinction here is that these values are going to be calculated on our new BANKSY count matrix.
- What steps/functions did we use to calculate clusters previously?
Dimensionality reduction
With this new matrix, we are once again going to calculate our PCA space, this time storing it as pca.banksy to keep it straight from the previous PCA coordinates that were calculated.
# PCA
seurat_banksy <- RunPCA(seurat_banksy,
assay = "BANKSY",
reduction.name = "pca.banksy",
features = rownames(seurat_banksy),
npcs = 30)We can take a look at the distribution of the samples and the gene IGKC (B cell marker gene) to see if this newly calculated PCA space requires integration.
# BANKSY PCA colored by sample
p_orig <- DimPlot(seurat_banksy,
group.by = "orig.ident",
reduction = "pca.banksy")
# BANKSY PCA colored by IGKC expression
p_igkc <- FeaturePlot(seurat_banksy,
feature = "IGKC",
reduction = "pca.banksy")
p_orig + p_igkcWe can see that there appears to be 2 unique populations of B cells in our PCA space, where the orig.ident or sample is driving the differences in
Integration
We are going to use the Harmony algorithm in order to batch correct our dataset. This is going to create a new reduction we will call harmony.banksy.
This may take several iterations and may take a few minutes to run.
- TODO how long does this step take
- 1 minute for me
# Harmony batch correction
seurat_banksy <- RunHarmony(seurat_banksy,
group.by.vars = "orig.ident",
reduction = "pca.banksy",
reduction.save = "harmony.banksy",
assay.use = "BANKSY")We can once again visualize what this new, corrected dimensionality reduction looks like:
Clustering
We are once again going to use a resolution = 0.65 so that we can make comparisons of our spatially-weighted clusters (BANKSY) versus transcript based clusters ().
# Find k-Nearest Neighbors
seurat_banksy <- FindNeighbors(seurat_banksy,
reduction = "pca.banksy",
dims = 1:30)
# Leiden clustering
seurat_banksy <- FindClusters(seurat_banksy,
algorithm = 4,
cluster.name = "banksy_cluster",
resolution = 0.65,
random.seed = 12345)At this point, we no longer need access to the BANKSY assay as all the relevant information has now been stored as:
banksy_clustersin@meta.datapca.banksyin@reductions
So to conserve memory, we are going to delete the BANKSY assay from our seurat object.
# Change default asasy from BANKSY
DefaultAssay(seurat_banksy) <- "Spatial.008um"
# Delete BANKSY assay by setting it to NULL
seurat_banksy[["BANKSY"]] <- NULL- Run BANKSY with a
lambdaof 0.2, how do the results differ from a value of 0.8?
Comparing clustering methods
At this point, we have two different results for our clusters - BANKSY and RNA-based clusters. Let us take a moment to compare and contrast what the differences are between the two.
Spatial Overlay
UMAP
DimPlot(seurat_banksy,
group.by = c("banksy_cluster",
"seurat_cluster.projected"),
reduction = "full.umap.sketch")- What does the UMAP look like when use the BANKSY derived PCA to calculate coordinates?
Save!
saveRDS("data/seurat_banksy.RDS")