Deconvolution

category_1
category_2
category_3
category_4

Write a description of the lesson here.

Author

Noor Sohail

Published

July 22, 2025

Keywords

keyword_1, keyword_2, keyword_3, keyword_4, keyword_5, keyword_6

Approximate time: XX minutes

Learning objectives

In this lesson, we will:

  • Learning Objective 1
  • Learning Objective 2
  • Learning Objective 3

Overview of lesson

When doing XYZ…

What is Deconvolution?

The single-digit micron resolution is a big technological improvement over Visium’s original ∼55μm spots. Because a single spot can include several cell types, the dataset is not straightforward to annotate. As a result, many methods model each spot as a mixture, assigning proportion values for each cell type within that spot. Each spot is represented as a piechart showing proportions of the cell type composition.

While Visium HD reduces bin size to approach single-cell resolution, it is important to note that sequencing-based spatial transcriptomics captures RNA from a defined spatial area rather than from isolated cells. As a result, transcripts from multiple cells overlapping a bin can be pooled and sequenced together. This means that even small bins may contain mixed cellular signals, which has important implications for downstream analyses such as differential gene expression.

Robust Cell Type Deconvolution

library(Seurat)
library(tidyverse)
library(spacexr)

# Set parallelization (multithreading)
# Speed up calculations
plan("multicore", workers = parallel::detectCores() - 1)

# Increases the size of the default vector
# 8GB
options(future.globals.maxSize = 8000 * 1024^2)

# Load dataset
seurat_banksy <- readRDS("data/seurat_banksy.RDS")

To perform accurate annotation of cell types, we must also take into consideration that our 16µm x 16µm bins may contain multiple cells. The method Robust Cell Type Deconvolution (RCTD) has been shown to accurately annotate spatial data from a variety of technologies while taking into consideration that a single bin may exhibit multiple cell type profiles.

We are going to annotate the slides one a time. There are extensions of the RCTD package in order to take into account replicates and multiple samples which we will not use at this time.

RCTD takes a cell-type-annotated scRNA-seq dataset as a reference and a spatial dataset as a query. For our reference, we will use a paired scRNA-seq FLEX dataset that was generated from the same patients.

Reference Dataset

The two most important pieces of information necessary to create a RCTD Reference() object are counts and celltype annotations. We can take a quick look at the different celltypes that we are going to annotate our query object with. For this reference, these celltypes values are stored in the columns Level1 and Level2 - with varying levels of granularity for the annotation.

The reference dataset was generated by downloading two files with bash:

  • Count matrix (.h5 file)
  • Metadata csv
curl -O https://cf.10xgenomics.com/samples/cell-exp/8.0.0/HumanColonCancer_Flex_Multiplex/HumanColonCancer_Flex_Multiplex_count_filtered_feature_bc_matrix.h5
wget "https://github.com/10XGenomics/HumanColonCancer_VisiumHD/raw/refs/heads/main/MetaData/SingleCell_MetaData.csv.gz"

Then in R, both files were loaded in and put together to create a seurat object. This is the same object that we will be loading in the next step of this lesson:

counts <- Read10X_h5("HumanColonCancer_Flex_Multiplex_count_filtered_feature_bc_matrix.h5")
meta <- read.csv("SingleCell_MetaData.csv")
rownames(meta) <- meta$Barcode

seurat <- CreateSeuratObject(counts = counts,
                             meta.data = meta,
                             project = "CRC FLEX")

# Grab UMAP coordinates and put them in a reduction
umap_coords <- as.matrix(seurat@meta.data[, c("UMAP1", "UMAP2")])

# Create a DimReduc and store it in the object as "umap"
seurat[["umap"]] <- CreateDimReducObject(embeddings = umap_coords,
                                         key = "UMAP_",
                                         assay = DefaultAssay(seurat))


# Remove cells that did not pass QC
seurat <- subset(seurat, subset = (QCFilter == "Keep"))

Idents(seurat) <- "Level1"
seurat_down <- subset(seurat, 
                      downsample = 500)

So first, let us load in the reference seurat object that was downloaded with the spatial dataset.

seurat_ref <- readRDS("data/crc_flex_ref_downsample.RDS")
# seurat_ref <- subset(seurat_ref,
#                      Level1 != "Tumor")
seurat_ref
An object of class Seurat 
18082 features across 4500 samples within 1 assay 
Active assay: RNA (18082 features, 0 variable features)
 1 layer present: counts
 1 dimensional reduction calculated: umap

Let us start by taking a look at the UMAP of our reference dataset.

# UMAP of reference dataset
DimPlot(seurat_ref, 
              group.by = "Level1", 
              label = TRUE) +
  ggtitle("CRC Reference")

Running RCTD

Reference

Now that we better understand the reference, we are going to make a RCTD object out of it. This time we use the Reference() function and additionally supply cluster (celltype) information to transfer to our query. We supply the following three parameters:

  • counts = raw counts matrix
  • cell_types = celltype annotation
  • nUMI = total number of counts per cell

We will be using the more specific, Level2 annotations:

# Raw counts of reference dataset
counts <- seurat_ref[["RNA"]]$counts

# Celltype annotation of reference dataset
cluster <- seurat_ref$Level1
cluster <- as.factor(cluster)

# Total counts of each cell
nUMI <- seurat_ref$nCount_RNA

# Create the RCTD reference object
reference <- Reference(counts = counts, 
                       cell_types = cluster, 
                       nUMI = nUMI)

When we look at the structure of this reference object, we can see that we have our nUMI, counts, and cell_types in this object.

str(reference)
Formal class 'Reference' [package "spacexr"] with 3 slots
  ..@ cell_types: Factor w/ 9 levels "B cells","Endothelial",..: 9 9 8 8 2 9 5 2 8 4 ...
  .. ..- attr(*, "names")= chr [1:4500] "AAAGGTGAGTCACCTAACTTTAGG-1" "AACTGAATCCTGGAAGACTTTAGG-1" "AAGGCACGTTCTTGAAACTTTAGG-1" "AAGGCTGTCGGTACGAACTTTAGG-1" ...
  ..@ counts    :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. ..@ i       : int [1:4460158] 112 136 174 194 226 227 240 277 292 303 ...
  .. .. ..@ p       : int [1:4501] 0 748 3659 4860 5210 5996 6923 7425 9815 10501 ...
  .. .. ..@ Dim     : int [1:2] 18082 4500
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : chr [1:18082] "SAMD11" "NOC2L" "KLHL17" "PLEKHN1" ...
  .. .. .. ..$ : chr [1:4500] "AAAGGTGAGTCACCTAACTTTAGG-1" "AACTGAATCCTGGAAGACTTTAGG-1" "AAGGCACGTTCTTGAAACTTTAGG-1" "AAGGCTGTCGGTACGAACTTTAGG-1" ...
  .. .. ..@ x       : num [1:4460158] 2 1 1 1 1 1 1 1 2 1 ...
  .. .. ..@ factors : list()
  ..@ nUMI      : Named num [1:4500] 936 4725 1542 407 1078 ...
  .. ..- attr(*, "names")= chr [1:4500] "AAAGGTGAGTCACCTAACTTTAGG-1" "AACTGAATCCTGGAAGACTTTAGG-1" "AAGGCACGTTCTTGAAACTTTAGG-1" "AAGGCTGTCGGTACGAACTTTAGG-1" ...

Create query

To begin, we are going to split our seurat object into a single sample.

crc <- subset(seurat_banksy, 
              subset = (orig.ident == "P5CRC"))
crc
An object of class Seurat 
36170 features across 57463 samples within 2 assays 
Active assay: Spatial.008um (18085 features, 2000 variable features)
 2 layers present: counts, data
 1 other assay present: sketch
 6 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch, pca.banksy, harmony.banksy
 1 spatial field of view present: P5CRC.008um

Deconvolution is very computationally expensive! So we can once again make use the sketch workflow to downsample our dataset in a careful manner.

# Create sketch of CRC subset
DefaultAssay(crc) <- 'Spatial.008um'

# HVGs are required input for SketchData
crc <- FindVariableFeatures(object = crc,
                            assay = "Spatial.008um")

# Calculate leverage score
crc <- SketchData(
  object = crc,
  ncells = 5000,
  method = "LeverageScore",
  sketched.assay = "sketch_crc",
  var.name = "leverage.score_crc")

# We need the PCA to later project back onto the entire dataset
crc <- FindVariableFeatures(crc)
crc <- ScaleData(crc)
crc <- RunPCA(crc, reduction.name = "pca.sketch_crc")

RCTD requires a unique data structure (similar to Seurat) as input to run. So we create our query object with SpatialRNA() by suppling the spatial coordinates and raw counts for the bins.

DefaultAssay(crc) <- "sketch_crc"

# Grab barcodes for sketched bins
counts_sketch <- crc[["sketch_crc"]]$counts
cells_sketch <- colnames(crc[["sketch_crc"]])

# Spatial coordinates for each bin
coords <- GetTissueCoordinates(crc)[cells_sketch, 1:2]

# Create the RCTD query object
query <- SpatialRNA(coords = coords, 
                    counts = counts_sketch, 
                    nUMI = colSums(counts_sketch))
str(query)
Formal class 'SpatialRNA' [package "spacexr"] with 3 slots
  ..@ coords:'data.frame':  5000 obs. of  2 variables:
  .. ..$ x: num [1:5000] 60621 54093 53817 56712 56687 ...
  .. ..$ y: num [1:5000] 62512 58219 61315 60684 59866 ...
  ..@ counts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. ..@ i       : int [1:2337095] 13 90 174 261 377 677 904 1012 1027 1133 ...
  .. .. ..@ p       : int [1:5001] 0 121 496 1523 2012 2224 2692 3347 3903 4218 ...
  .. .. ..@ Dim     : int [1:2] 18085 5000
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : chr [1:18085] "SAMD11" "NOC2L" "KLHL17" "PLEKHN1" ...
  .. .. .. ..$ : chr [1:5000] "P5CRC_s_008um_00052_00559-1" "P5CRC_s_008um_00198_00335-1" "P5CRC_s_008um_00092_00326-1" "P5CRC_s_008um_00114_00425-1" ...
  .. .. ..@ x       : num [1:2337095] 1 1 1 1 1 1 1 1 1 2 ...
  .. .. ..@ factors : list()
  ..@ nUMI  : Named num [1:5000] 128 503 1576 750 238 ...
  .. ..- attr(*, "names")= chr [1:5000] "P5CRC_s_008um_00052_00559-1" "P5CRC_s_008um_00198_00335-1" "P5CRC_s_008um_00092_00326-1" "P5CRC_s_008um_00114_00425-1" ...

Run RCTD

The more celltypes you have the longer this will take.

Warning

We need to provide a time estimate for this step

This took me ~3 minutes

# Run RCTD
RCTD <- create.RCTD(spatialRNA = query, 
                    reference = reference, 
                    max_cores = parallel::detectCores() - 1)
RCTD <- run.RCTD(RCTD, doublet_mode = "doublet") # This step takes a while

RCTD Results

The resultant dataframe from RCTD will contain the following columns according to the documentation:

  • spot_class, a factor variable representing RCTD’s classification in doublet mode:
    • “singlet” (1 cell type on pixel)
    • “doublet_certain” (2 cell types on pixel)
    • “doublet_uncertain” (2 cell types on pixel, but only confident of 1)
    • “reject” (no prediction given for pixel)
  • first_type column gives the first cell type predicted on the bead (for all spot_class conditions except “reject”).
  • second_type column gives the second cell type predicted on the bead for doublet spot_class conditions (not a confident prediction for “doublet_uncertain”).

The results of deconvolution are stored in the results$results_df slot of the RCTD object:

df_rctd_results <- RCTD@results$results_df
df_rctd_results %>% head()
                            spot_class            first_type
P5CRC_s_008um_00052_00559-1    singlet               Myeloid
P5CRC_s_008um_00198_00335-1    singlet                 Tumor
P5CRC_s_008um_00092_00326-1    singlet                 Tumor
P5CRC_s_008um_00114_00425-1    singlet Intestinal Epithelial
P5CRC_s_008um_00142_00424-1    singlet           Endothelial
P5CRC_s_008um_00131_00541-1    singlet Intestinal Epithelial
                                      second_type first_class second_class
P5CRC_s_008um_00052_00559-1           Endothelial       FALSE        FALSE
P5CRC_s_008um_00198_00335-1 Intestinal Epithelial       FALSE        FALSE
P5CRC_s_008um_00092_00326-1 Intestinal Epithelial       FALSE        FALSE
P5CRC_s_008um_00114_00425-1               B cells       FALSE        FALSE
P5CRC_s_008um_00142_00424-1            Fibroblast       FALSE        FALSE
P5CRC_s_008um_00131_00541-1                 Tumor       FALSE        FALSE
                            min_score singlet_score conv_all conv_doublet
P5CRC_s_008um_00052_00559-1  247.6690      248.6940     TRUE         TRUE
P5CRC_s_008um_00198_00335-1  412.7892      414.9878     TRUE         TRUE
P5CRC_s_008um_00092_00326-1  873.5261      874.7588     TRUE         TRUE
P5CRC_s_008um_00114_00425-1  595.2669      595.0713     TRUE         TRUE
P5CRC_s_008um_00142_00424-1  347.1421      350.3347     TRUE         TRUE
P5CRC_s_008um_00131_00541-1  550.9241      557.3652     TRUE         TRUE

We can also visualize the identifications established for each first_type:

ggplot(df_rctd_results,
       aes(x = first_type, fill = first_type)) +
  geom_bar() +
  theme_bw() + ggtitle("RCTD First Type: P5CRC Sample") +
  NoLegend() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Project RCTD results on all bins

Let us take a look at how many rows (bins) we have these annotations for:

nrow(df_rctd_results)
[1] 4809

But didn’t our sketch assay include 5,000 bins? RCTD runs its own QC and when it cannot identify what the annotation for a specific bin is, then it is removed from the final results.

So we are going to do some data wrangling to ensure that we keep track of all information. Including those ~800 bins without first_type labels.

# Create column cell to merge on
# Also make sure columns are not factors
df_rctd_results <- df_rctd_results %>%
  mutate(cell = rownames(df_rctd_results)) %>%
  mutate(first_type = as.character(first_type),
         second_type = as.character(second_type),
         spot_class = as.character(spot_class))

# Update column names to have _sketch
# This is so that we can create a projected column
colnames(df_rctd_results) <- paste0(colnames(df_rctd_results), 
                                    "_sketch")
df_rctd_results %>% head()
                            spot_class_sketch     first_type_sketch
P5CRC_s_008um_00052_00559-1           singlet               Myeloid
P5CRC_s_008um_00198_00335-1           singlet                 Tumor
P5CRC_s_008um_00092_00326-1           singlet                 Tumor
P5CRC_s_008um_00114_00425-1           singlet Intestinal Epithelial
P5CRC_s_008um_00142_00424-1           singlet           Endothelial
P5CRC_s_008um_00131_00541-1           singlet Intestinal Epithelial
                               second_type_sketch first_class_sketch
P5CRC_s_008um_00052_00559-1           Endothelial              FALSE
P5CRC_s_008um_00198_00335-1 Intestinal Epithelial              FALSE
P5CRC_s_008um_00092_00326-1 Intestinal Epithelial              FALSE
P5CRC_s_008um_00114_00425-1               B cells              FALSE
P5CRC_s_008um_00142_00424-1            Fibroblast              FALSE
P5CRC_s_008um_00131_00541-1                 Tumor              FALSE
                            second_class_sketch min_score_sketch
P5CRC_s_008um_00052_00559-1               FALSE         247.6690
P5CRC_s_008um_00198_00335-1               FALSE         412.7892
P5CRC_s_008um_00092_00326-1               FALSE         873.5261
P5CRC_s_008um_00114_00425-1               FALSE         595.2669
P5CRC_s_008um_00142_00424-1               FALSE         347.1421
P5CRC_s_008um_00131_00541-1               FALSE         550.9241
                            singlet_score_sketch conv_all_sketch
P5CRC_s_008um_00052_00559-1             248.6940            TRUE
P5CRC_s_008um_00198_00335-1             414.9878            TRUE
P5CRC_s_008um_00092_00326-1             874.7588            TRUE
P5CRC_s_008um_00114_00425-1             595.0713            TRUE
P5CRC_s_008um_00142_00424-1             350.3347            TRUE
P5CRC_s_008um_00131_00541-1             557.3652            TRUE
                            conv_doublet_sketch                 cell_sketch
P5CRC_s_008um_00052_00559-1                TRUE P5CRC_s_008um_00052_00559-1
P5CRC_s_008um_00198_00335-1                TRUE P5CRC_s_008um_00198_00335-1
P5CRC_s_008um_00092_00326-1                TRUE P5CRC_s_008um_00092_00326-1
P5CRC_s_008um_00114_00425-1                TRUE P5CRC_s_008um_00114_00425-1
P5CRC_s_008um_00142_00424-1                TRUE P5CRC_s_008um_00142_00424-1
P5CRC_s_008um_00131_00541-1                TRUE P5CRC_s_008um_00131_00541-1

Now let us merge this information into our Seurat object. THis is necessary step so that we can merge i

crc$cell <- rownames(crc@meta.data)
meta <- crc@meta.data
meta <- left_join(meta, 
                  df_rctd_results,
                  by = c("cell" = "cell_sketch"))
rownames(meta) <- meta$cell
crc@meta.data <- meta

sketch_cells <- colnames(crc[["sketch_crc"]])
# Set value ot unassigned to sketch bins that do not have values
# This is because ProjectData will throw errors if there is not a value
# For every sketch bin
cols <- c("spot_class_sketch", 
          "first_type_sketch", 
          "second_type_sketch")

for (col in cols) {
  # Find rows that are in sketch_cells and are NA
  na_rows <- sketch_cells[is.na(crc@meta.data[sketch_cells, col])]
  crc@meta.data[na_rows, col] <- "unassigned"
}
table(crc$first_type_sketch)

              B cells           Endothelial            Fibroblast 
                  387                   326                   413 
Intestinal Epithelial               Myeloid              Neuronal 
                 1462                   420                    91 
        Smooth Muscle               T cells                 Tumor 
                  127                   247                  1336 
           unassigned 
                  191 

Project Back to Full Dataset

# Project back first_type to new column called full_first_type
crc <- ProjectData(
  object = crc,
  assay = "Spatial.008um",
  sketched.assay = "sketch_crc",
  full.reduction = "pca_crc",
  sketched.reduction = "pca.sketch_crc",
  dims = 1:30,
  refdata = list(spot_class = "spot_class_sketch",
    first_type = "first_type_sketch",
                 second_type = "second_type_sketch"))

Visualize New Annotation

SpatialDimPlot(crc,
               group.by = "first_type",
               pt.size.factor = 15,
               image.alpha = 0.5)

df <- FetchData(crc,
                c("x", "y", "first_type"))

ggplot(df,
       aes(x = x, y = y,
           color = first_type)) +
  geom_point() +
  theme_bw() +
  facet_wrap(~first_type)


Next Lesson >>

Back to Schedule

Reuse

CC-BY-4.0