library(Seurat)
library(tidyverse)
library(spacexr)
# Set parallelization (multithreading)
# Speed up calculations
plan("multicore", workers = parallel::detectCores() - 1)
# Increases the size of the default vector
# 8GB
options(future.globals.maxSize = 8000 * 1024^2)
# Load dataset
seurat_banksy <- readRDS("data/seurat_banksy.RDS")Deconvolution
Write a description of the lesson here.
keyword_1, keyword_2, keyword_3, keyword_4, keyword_5, keyword_6
Approximate time: XX minutes
Learning objectives
In this lesson, we will:
- Learning Objective 1
- Learning Objective 2
- Learning Objective 3
Overview of lesson
When doing XYZ…
What is Deconvolution?
The single-digit micron resolution is a big technological improvement over Visium’s original ∼55μm spots. Because a single spot can include several cell types, the dataset is not straightforward to annotate. As a result, many methods model each spot as a mixture, assigning proportion values for each cell type within that spot. Each spot is represented as a piechart showing proportions of the cell type composition.
While Visium HD reduces bin size to approach single-cell resolution, it is important to note that sequencing-based spatial transcriptomics captures RNA from a defined spatial area rather than from isolated cells. As a result, transcripts from multiple cells overlapping a bin can be pooled and sequenced together. This means that even small bins may contain mixed cellular signals, which has important implications for downstream analyses such as differential gene expression.
Robust Cell Type Deconvolution
To perform accurate annotation of cell types, we must also take into consideration that our 16µm x 16µm bins may contain multiple cells. The method Robust Cell Type Deconvolution (RCTD) has been shown to accurately annotate spatial data from a variety of technologies while taking into consideration that a single bin may exhibit multiple cell type profiles.
We are going to annotate the slides one a time. There are extensions of the RCTD package in order to take into account replicates and multiple samples which we will not use at this time.
RCTD takes a cell-type-annotated scRNA-seq dataset as a reference and a spatial dataset as a query. For our reference, we will use a paired scRNA-seq FLEX dataset that was generated from the same patients.
Reference Dataset
The two most important pieces of information necessary to create a RCTD Reference() object are counts and celltype annotations. We can take a quick look at the different celltypes that we are going to annotate our query object with. For this reference, these celltypes values are stored in the columns Level1 and Level2 - with varying levels of granularity for the annotation.
The reference dataset was generated by downloading two files with bash:
- Count matrix (
.h5file) - Metadata csv
curl -O https://cf.10xgenomics.com/samples/cell-exp/8.0.0/HumanColonCancer_Flex_Multiplex/HumanColonCancer_Flex_Multiplex_count_filtered_feature_bc_matrix.h5
wget "https://github.com/10XGenomics/HumanColonCancer_VisiumHD/raw/refs/heads/main/MetaData/SingleCell_MetaData.csv.gz"Then in R, both files were loaded in and put together to create a seurat object. This is the same object that we will be loading in the next step of this lesson:
counts <- Read10X_h5("HumanColonCancer_Flex_Multiplex_count_filtered_feature_bc_matrix.h5")
meta <- read.csv("SingleCell_MetaData.csv")
rownames(meta) <- meta$Barcode
seurat <- CreateSeuratObject(counts = counts,
meta.data = meta,
project = "CRC FLEX")
# Grab UMAP coordinates and put them in a reduction
umap_coords <- as.matrix(seurat@meta.data[, c("UMAP1", "UMAP2")])
# Create a DimReduc and store it in the object as "umap"
seurat[["umap"]] <- CreateDimReducObject(embeddings = umap_coords,
key = "UMAP_",
assay = DefaultAssay(seurat))
# Remove cells that did not pass QC
seurat <- subset(seurat, subset = (QCFilter == "Keep"))
Idents(seurat) <- "Level1"
seurat_down <- subset(seurat,
downsample = 500)So first, let us load in the reference seurat object that was downloaded with the spatial dataset.
seurat_ref <- readRDS("data/crc_flex_ref_downsample.RDS")
# seurat_ref <- subset(seurat_ref,
# Level1 != "Tumor")
seurat_refAn object of class Seurat
18082 features across 4500 samples within 1 assay
Active assay: RNA (18082 features, 0 variable features)
1 layer present: counts
1 dimensional reduction calculated: umap
Let us start by taking a look at the UMAP of our reference dataset.
Running RCTD
Reference
Now that we better understand the reference, we are going to make a RCTD object out of it. This time we use the Reference() function and additionally supply cluster (celltype) information to transfer to our query. We supply the following three parameters:
counts= raw counts matrixcell_types= celltype annotationnUMI= total number of counts per cell
We will be using the more specific, Level2 annotations:
# Raw counts of reference dataset
counts <- seurat_ref[["RNA"]]$counts
# Celltype annotation of reference dataset
cluster <- seurat_ref$Level1
cluster <- as.factor(cluster)
# Total counts of each cell
nUMI <- seurat_ref$nCount_RNA
# Create the RCTD reference object
reference <- Reference(counts = counts,
cell_types = cluster,
nUMI = nUMI)When we look at the structure of this reference object, we can see that we have our nUMI, counts, and cell_types in this object.
str(reference)Formal class 'Reference' [package "spacexr"] with 3 slots
..@ cell_types: Factor w/ 9 levels "B cells","Endothelial",..: 9 9 8 8 2 9 5 2 8 4 ...
.. ..- attr(*, "names")= chr [1:4500] "AAAGGTGAGTCACCTAACTTTAGG-1" "AACTGAATCCTGGAAGACTTTAGG-1" "AAGGCACGTTCTTGAAACTTTAGG-1" "AAGGCTGTCGGTACGAACTTTAGG-1" ...
..@ counts :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. .. ..@ i : int [1:4460158] 112 136 174 194 226 227 240 277 292 303 ...
.. .. ..@ p : int [1:4501] 0 748 3659 4860 5210 5996 6923 7425 9815 10501 ...
.. .. ..@ Dim : int [1:2] 18082 4500
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : chr [1:18082] "SAMD11" "NOC2L" "KLHL17" "PLEKHN1" ...
.. .. .. ..$ : chr [1:4500] "AAAGGTGAGTCACCTAACTTTAGG-1" "AACTGAATCCTGGAAGACTTTAGG-1" "AAGGCACGTTCTTGAAACTTTAGG-1" "AAGGCTGTCGGTACGAACTTTAGG-1" ...
.. .. ..@ x : num [1:4460158] 2 1 1 1 1 1 1 1 2 1 ...
.. .. ..@ factors : list()
..@ nUMI : Named num [1:4500] 936 4725 1542 407 1078 ...
.. ..- attr(*, "names")= chr [1:4500] "AAAGGTGAGTCACCTAACTTTAGG-1" "AACTGAATCCTGGAAGACTTTAGG-1" "AAGGCACGTTCTTGAAACTTTAGG-1" "AAGGCTGTCGGTACGAACTTTAGG-1" ...
Create query
To begin, we are going to split our seurat object into a single sample.
crc <- subset(seurat_banksy,
subset = (orig.ident == "P5CRC"))
crcAn object of class Seurat
36170 features across 57463 samples within 2 assays
Active assay: Spatial.008um (18085 features, 2000 variable features)
2 layers present: counts, data
1 other assay present: sketch
6 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch, pca.banksy, harmony.banksy
1 spatial field of view present: P5CRC.008um
Deconvolution is very computationally expensive! So we can once again make use the sketch workflow to downsample our dataset in a careful manner.
# Create sketch of CRC subset
DefaultAssay(crc) <- 'Spatial.008um'
# HVGs are required input for SketchData
crc <- FindVariableFeatures(object = crc,
assay = "Spatial.008um")
# Calculate leverage score
crc <- SketchData(
object = crc,
ncells = 5000,
method = "LeverageScore",
sketched.assay = "sketch_crc",
var.name = "leverage.score_crc")
# We need the PCA to later project back onto the entire dataset
crc <- FindVariableFeatures(crc)
crc <- ScaleData(crc)
crc <- RunPCA(crc, reduction.name = "pca.sketch_crc")RCTD requires a unique data structure (similar to Seurat) as input to run. So we create our query object with SpatialRNA() by suppling the spatial coordinates and raw counts for the bins.
DefaultAssay(crc) <- "sketch_crc"
# Grab barcodes for sketched bins
counts_sketch <- crc[["sketch_crc"]]$counts
cells_sketch <- colnames(crc[["sketch_crc"]])
# Spatial coordinates for each bin
coords <- GetTissueCoordinates(crc)[cells_sketch, 1:2]
# Create the RCTD query object
query <- SpatialRNA(coords = coords,
counts = counts_sketch,
nUMI = colSums(counts_sketch))str(query)Formal class 'SpatialRNA' [package "spacexr"] with 3 slots
..@ coords:'data.frame': 5000 obs. of 2 variables:
.. ..$ x: num [1:5000] 60621 54093 53817 56712 56687 ...
.. ..$ y: num [1:5000] 62512 58219 61315 60684 59866 ...
..@ counts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. .. ..@ i : int [1:2337095] 13 90 174 261 377 677 904 1012 1027 1133 ...
.. .. ..@ p : int [1:5001] 0 121 496 1523 2012 2224 2692 3347 3903 4218 ...
.. .. ..@ Dim : int [1:2] 18085 5000
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : chr [1:18085] "SAMD11" "NOC2L" "KLHL17" "PLEKHN1" ...
.. .. .. ..$ : chr [1:5000] "P5CRC_s_008um_00052_00559-1" "P5CRC_s_008um_00198_00335-1" "P5CRC_s_008um_00092_00326-1" "P5CRC_s_008um_00114_00425-1" ...
.. .. ..@ x : num [1:2337095] 1 1 1 1 1 1 1 1 1 2 ...
.. .. ..@ factors : list()
..@ nUMI : Named num [1:5000] 128 503 1576 750 238 ...
.. ..- attr(*, "names")= chr [1:5000] "P5CRC_s_008um_00052_00559-1" "P5CRC_s_008um_00198_00335-1" "P5CRC_s_008um_00092_00326-1" "P5CRC_s_008um_00114_00425-1" ...
Run RCTD
The more celltypes you have the longer this will take.
We need to provide a time estimate for this step
This took me ~3 minutes
# Run RCTD
RCTD <- create.RCTD(spatialRNA = query,
reference = reference,
max_cores = parallel::detectCores() - 1)
RCTD <- run.RCTD(RCTD, doublet_mode = "doublet") # This step takes a whileRCTD Results
The resultant dataframe from RCTD will contain the following columns according to the documentation:
spot_class, a factor variable representing RCTD’s classification in doublet mode:- “singlet” (1 cell type on pixel)
- “doublet_certain” (2 cell types on pixel)
- “doublet_uncertain” (2 cell types on pixel, but only confident of 1)
- “reject” (no prediction given for pixel)
first_typecolumn gives the first cell type predicted on the bead (for all spot_class conditions except “reject”).second_typecolumn gives the second cell type predicted on the bead for doublet spot_class conditions (not a confident prediction for “doublet_uncertain”).
The results of deconvolution are stored in the results$results_df slot of the RCTD object:
df_rctd_results <- RCTD@results$results_df
df_rctd_results %>% head() spot_class first_type
P5CRC_s_008um_00052_00559-1 singlet Myeloid
P5CRC_s_008um_00198_00335-1 singlet Tumor
P5CRC_s_008um_00092_00326-1 singlet Tumor
P5CRC_s_008um_00114_00425-1 singlet Intestinal Epithelial
P5CRC_s_008um_00142_00424-1 singlet Endothelial
P5CRC_s_008um_00131_00541-1 singlet Intestinal Epithelial
second_type first_class second_class
P5CRC_s_008um_00052_00559-1 Endothelial FALSE FALSE
P5CRC_s_008um_00198_00335-1 Intestinal Epithelial FALSE FALSE
P5CRC_s_008um_00092_00326-1 Intestinal Epithelial FALSE FALSE
P5CRC_s_008um_00114_00425-1 B cells FALSE FALSE
P5CRC_s_008um_00142_00424-1 Fibroblast FALSE FALSE
P5CRC_s_008um_00131_00541-1 Tumor FALSE FALSE
min_score singlet_score conv_all conv_doublet
P5CRC_s_008um_00052_00559-1 247.6690 248.6940 TRUE TRUE
P5CRC_s_008um_00198_00335-1 412.7892 414.9878 TRUE TRUE
P5CRC_s_008um_00092_00326-1 873.5261 874.7588 TRUE TRUE
P5CRC_s_008um_00114_00425-1 595.2669 595.0713 TRUE TRUE
P5CRC_s_008um_00142_00424-1 347.1421 350.3347 TRUE TRUE
P5CRC_s_008um_00131_00541-1 550.9241 557.3652 TRUE TRUE
We can also visualize the identifications established for each first_type:
Project RCTD results on all bins
Let us take a look at how many rows (bins) we have these annotations for:
nrow(df_rctd_results)[1] 4809
But didn’t our sketch assay include 5,000 bins? RCTD runs its own QC and when it cannot identify what the annotation for a specific bin is, then it is removed from the final results.
So we are going to do some data wrangling to ensure that we keep track of all information. Including those ~800 bins without first_type labels.
# Create column cell to merge on
# Also make sure columns are not factors
df_rctd_results <- df_rctd_results %>%
mutate(cell = rownames(df_rctd_results)) %>%
mutate(first_type = as.character(first_type),
second_type = as.character(second_type),
spot_class = as.character(spot_class))
# Update column names to have _sketch
# This is so that we can create a projected column
colnames(df_rctd_results) <- paste0(colnames(df_rctd_results),
"_sketch")
df_rctd_results %>% head() spot_class_sketch first_type_sketch
P5CRC_s_008um_00052_00559-1 singlet Myeloid
P5CRC_s_008um_00198_00335-1 singlet Tumor
P5CRC_s_008um_00092_00326-1 singlet Tumor
P5CRC_s_008um_00114_00425-1 singlet Intestinal Epithelial
P5CRC_s_008um_00142_00424-1 singlet Endothelial
P5CRC_s_008um_00131_00541-1 singlet Intestinal Epithelial
second_type_sketch first_class_sketch
P5CRC_s_008um_00052_00559-1 Endothelial FALSE
P5CRC_s_008um_00198_00335-1 Intestinal Epithelial FALSE
P5CRC_s_008um_00092_00326-1 Intestinal Epithelial FALSE
P5CRC_s_008um_00114_00425-1 B cells FALSE
P5CRC_s_008um_00142_00424-1 Fibroblast FALSE
P5CRC_s_008um_00131_00541-1 Tumor FALSE
second_class_sketch min_score_sketch
P5CRC_s_008um_00052_00559-1 FALSE 247.6690
P5CRC_s_008um_00198_00335-1 FALSE 412.7892
P5CRC_s_008um_00092_00326-1 FALSE 873.5261
P5CRC_s_008um_00114_00425-1 FALSE 595.2669
P5CRC_s_008um_00142_00424-1 FALSE 347.1421
P5CRC_s_008um_00131_00541-1 FALSE 550.9241
singlet_score_sketch conv_all_sketch
P5CRC_s_008um_00052_00559-1 248.6940 TRUE
P5CRC_s_008um_00198_00335-1 414.9878 TRUE
P5CRC_s_008um_00092_00326-1 874.7588 TRUE
P5CRC_s_008um_00114_00425-1 595.0713 TRUE
P5CRC_s_008um_00142_00424-1 350.3347 TRUE
P5CRC_s_008um_00131_00541-1 557.3652 TRUE
conv_doublet_sketch cell_sketch
P5CRC_s_008um_00052_00559-1 TRUE P5CRC_s_008um_00052_00559-1
P5CRC_s_008um_00198_00335-1 TRUE P5CRC_s_008um_00198_00335-1
P5CRC_s_008um_00092_00326-1 TRUE P5CRC_s_008um_00092_00326-1
P5CRC_s_008um_00114_00425-1 TRUE P5CRC_s_008um_00114_00425-1
P5CRC_s_008um_00142_00424-1 TRUE P5CRC_s_008um_00142_00424-1
P5CRC_s_008um_00131_00541-1 TRUE P5CRC_s_008um_00131_00541-1
Now let us merge this information into our Seurat object. THis is necessary step so that we can merge i
crc$cell <- rownames(crc@meta.data)
meta <- crc@meta.data
meta <- left_join(meta,
df_rctd_results,
by = c("cell" = "cell_sketch"))
rownames(meta) <- meta$cell
crc@meta.data <- meta
sketch_cells <- colnames(crc[["sketch_crc"]])# Set value ot unassigned to sketch bins that do not have values
# This is because ProjectData will throw errors if there is not a value
# For every sketch bin
cols <- c("spot_class_sketch",
"first_type_sketch",
"second_type_sketch")
for (col in cols) {
# Find rows that are in sketch_cells and are NA
na_rows <- sketch_cells[is.na(crc@meta.data[sketch_cells, col])]
crc@meta.data[na_rows, col] <- "unassigned"
}table(crc$first_type_sketch)
B cells Endothelial Fibroblast
387 326 413
Intestinal Epithelial Myeloid Neuronal
1462 420 91
Smooth Muscle T cells Tumor
127 247 1336
unassigned
191
Project Back to Full Dataset
# Project back first_type to new column called full_first_type
crc <- ProjectData(
object = crc,
assay = "Spatial.008um",
sketched.assay = "sketch_crc",
full.reduction = "pca_crc",
sketched.reduction = "pca.sketch_crc",
dims = 1:30,
refdata = list(spot_class = "spot_class_sketch",
first_type = "first_type_sketch",
second_type = "second_type_sketch"))Visualize New Annotation
df <- FetchData(crc,
c("x", "y", "first_type"))
ggplot(df,
aes(x = x, y = y,
color = first_type)) +
geom_point() +
theme_bw() +
facet_wrap(~first_type)