# Moran's I
# Visium HD spatial transcriptomics workshop
# Author: Harvard Chan Bioinformatics Core
# Created: December 2025
# Load libararies
library(tidyverse)
library(Seurat)
set.seed(12345)
# Increases the size of the default vector (8GB)
options(future.globals.maxSize = 8000 * 1024^2)
# Set parallelization (multithreading) to speed up calculations
library(future)
plan(multisession, workers = parallel::detectCores() - 1)
# Load dataset
seurat_rctd <- readRDS("data/intermediate_seurat/seurat_RCTD.rds")Spatially Variable Genes
Identify spatially variable genes whose expression changes across tissue locations. This lesson shows how to detect, visualize, and prioritize SVGs as markers of spatial domains and gradients.
Variable genes, Spatial patterns, Spatial autocorrelation, Spatial gradients
Approximate time: XX minutes
Learning objectives
In this lesson, we will:
- Define the Moran’s I calculation
- Run Moran’s I to identify spatially variable genes
- Visualize spatialy variable genes
Overview of lesson
At this step, instead of comparing defined cell types, you might ask which genes vary as a function of spatial location. In this lesson you apply methods for detecting spatially variable genes (SVGs), then map and visualize their expression patterns across the tissue. These genes often serve as interpretable anchors for defining domains, building signatures, and linking expression to morphology.
Spatially variable genes (SVGs)
Spatially variable genes (SVGs) are similar in concept to highly variable genes (HVGs). These were identifies by evaluating the variability of gene expression across the entire dataset. However, now we have the additional layer of the spatial location of each bin. Therefore we are able to identify genes that have different levels of expression depending on which area of the tissue they are found on.
Image source: Yan et al. (2025)
These methods can be used to identify genes that have strong strong spatial associations for downstream analyses - similar to how we used HVGs to calculate PCA. These SVGs can also be used to evaluate spatial niches within cell-types to assist in find subtypes with the tissue.
Ultimately, these genes provide another peek in the spatial context of the tissue.
Moran’s I
Many of the SVG algorithms are built upon the concept of Moran’s I which is defined as “a measure of spatial autocorrelation”. The math behind the method is defined like so:
\[ I = \frac{N}{S_0} \frac{ \sum_{i=1}^N \sum_{j=1}^N w_{ij}\,(x_i - \bar{x})(x_j - \bar{x}) }{ \sum_{i=1}^N (x_i - \bar{x})^2 } \]
where:
- \(N\) is the number of spatial units, indexed by \(i\) and \(j\);
- \(x\) is the variable of interest;
- \(\bar{x}\) is the mean of \(x\);
- \(w_{ij}\) are the elements of the spatial-weights matrix, with \(w_{ii} = 0\);
- \(S_0\) is the sum of all spatial weights, \(S_0 = \sum_{i=1}^N \sum_{j=1}^N w_{ij}\).
For a random spatial distribution, Moran’s I will be near zero. A large positive value indicates the data is clustered. A large negative value indicates that the data is dispersed, meaning units that are close spatially have dissimilar variable values.
The range of values for Moran’s I E(I) in spatial transcriptomics is usually [-1, 1].
Image source: Tsui et al. (2022)
To see why this is the case, consider the expression \((x_i - \bar{x}) (x_j - \bar{x})\) in the above formula. If \(x_i\) and \(x_j\) are similar and far from the mean, then this product will be large and positive. If they are dissimilar (in particular, if one is well above the mean and one is far below) then this product will be large and negative. Note that Moran’s I depends strongly on the calculation of \(w_{ij}\), which represents the spatial relations of the bins. A large value \(w_{ij}\) indicates that bins \(x_i\) and \(x_j\) are close spatially.
Running Moran’s I
Moran’s I will work best if we focus on one sample at a time, as the spatial coordinates will be consistent within a single slide. Furthermore, because this step can take a long time to run, we are going to further subset the data to a singular celltype. In this way, we are attempting to evaluate genes that are specific to spatially defined subtype of a celltype.
In this case, let us zoom in on the tumor cells in the CRC dataset:
# Subset to tumor P5CRC bins
crc_sub <- subset(seurat_rctd,
subset = (orig.ident == "P5CRC") &
(first_type == "Tumor"))
crc_subAn object of class Seurat
36170 features across 12855 samples within 2 assays
Active assay: Spatial.008um (18085 features, 2000 variable features)
2 layers present: counts, data
1 other assay present: sketch
6 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch, pca.banksy, harmony.banksy
1 spatial field of view present: P5CRC.008um
For the best results we are going to make use of the scale.data assay (this is typically calculated right before the PCA step). We only ever calculated this on the sketch assay, so here we will run the ScaleData() function on our entire subsetted dataset.
# Scale log-normalized counts
crc_sub <- ScaleData(crc_sub)
crc_subAn object of class Seurat
36170 features across 12855 samples within 2 assays
Active assay: Spatial.008um (18085 features, 2000 variable features)
3 layers present: counts, data, scale.data
1 other assay present: sketch
6 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch, pca.banksy, harmony.banksy
1 spatial field of view present: P5CRC.008um
Now we can actually calculate Moran’s I with FindSpatiallyVariableFeatures() and set the selection.method = "moransi". We are additionally going to specify that we want to evaluate Moran’s I on the top 50 highly variable genes. A gene that is spatially variable is likely to also be highly variable because ultimately we are assessing changes in gene expression across different populations. We take the top 50 to reduce the number of calculations that take place (so we can reasonably run this on a laptop).
This step may take a few minutes to run.
# Run Moran's I on the top 50 highly variable genes
crc_sub <- FindSpatiallyVariableFeatures(crc_sub,
selection.method = "moransi",
verbose = TRUE,
features = VariableFeatures(crc_sub)[1:50],
image = "P5CRC.008um")Once the calculation completes, we can evaluate the Moran’s I scores and associated p-values for each of the genes. We will have to do a bit of subsetting as we only calculated these values for 50 genes.
# Moran's I scores for each gene
SVFInfo(crc_sub, method = "moransi") %>%
subset(!is.na(MoransI_observed)) %>%
arrange(desc(MoransI_observed)) %>%
head()FindSpatiallyVariableFeatures() with associated Moran’s I scores and p-values.
| MoransI_observed | MoransI_p.value | |
|---|---|---|
| REG3A | 0.4772619 | 0.0009756 |
| OLFM4 | 0.4213175 | 0.0009756 |
| PI3 | 0.3579835 | 0.0009756 |
| LCN2 | 0.2793870 | 0.0009756 |
| MUC17 | 0.2544224 | 0.0009756 |
| CHGB | 0.2052781 | 0.0009756 |
Then, we can use the SpatiallyVariableFeatures() to get the list of SVGs that were just calculated. For visualization purposes, we are going to grab the top 10 genes.
# Get top 10 spatially variable features
top10_svg <- SpatiallyVariableFeatures(crc_sub,
method = "moransi")
top10_svg <- top10_svg[1:10]
top10_svg [1] "REG3A" "OLFM4" "PI3" "LCN2" "MUC17" "CHGB" "IGHG1" "CHGA" "IGKC"
[10] "MUC6"
And visualize the expression of these genes on the spatial slide:
# Spatial plot of top SVGs
SpatialFeaturePlot(crc_sub,
features = top10_svg,
pt.size.factor = 15,
image.alpha = 0.1,
ncol = 5)Recall in the BANKSY lesson we noted that the BANKSY algorithm identified different, spatially constrained subtypes in this cancerous region. Now we can clearly see which of those genes were defining the phenotype of these tumor subtypes.
# Dotplot of tumor clusters for top SVGs
DotPlot(crc_sub,
top10_svg,
group.by = "banksy_cluster",
idents = c(6, 13, 14, 15, 19))Now we can see that there is a phenotype that is distinct between each of these groups of cancerous regions. For example, cluster 15 and 19 appear to be have different expression patterns compared to the other groups of tumor bins.
Using these spatially variable genes, we can identify subtypes of broader cell types.
Other SVG methods
As the field of spatial transcriptomics is ever evolving, there are other algorithms that use other priors and calcluations to identify spatially variable genes. Some methods include, but are not limited to:
Note that several of these are written in Python.