# Load libraries
library(curl)
library(R.utils)
library(Seurat)
# Download count matrix
curl_download(
url = "https://cf.10xgenomics.com/samples/cell-exp/8.0.0/HumanColonCancer_Flex_Multiplex/HumanColonCancer_Flex_Multiplex_count_filtered_feature_bc_matrix.h5",
destfile = "data/HumanColonCancer_Flex_Multiplex_count_filtered_feature_bc_matrix.h5"
)
# Download metadata
curl_download(
url = "https://github.com/10XGenomics/HumanColonCancer_VisiumHD/raw/refs/heads/main/MetaData/SingleCell_MetaData.csv.gz",
dest = "data/SingleCell_MetaData.csv.gz"
)
# Uncompress the metadata file
gunzip("data/SingleCell_MetaData.csv.gz",
remove = FALSE)Generating the Reference Dataset for RCTD
In this lesson, we will build a high-quality single-cell RNA-seq reference object for RCTD deconvolution.
Reference, CRC, Public dataset, Seurat
Approximate time: 10 minutes
Overview of lesson
Deconvolution requires a trustworthy scRNA-seaq reference dataset to calculate average expressions across cell types in your query dataset. For the CRC dataset that we have been working with throughout this workshop, we will use the 10X paired FLEX dataset that was created. Since filtering, normalization, clustering and manual celltype annotation has already been done for the object and stored in the metadata, we will follow the author’s original workflow.
Investing effort into creating a clean reference dataset will improve the accuracy and interpretability of deconvolution results in the main deconvolution lesson.
Download the dataset
The reference dataset was generated by downloading two files with bash in terminal:
- Count matrix (
.h5file) - Metadata csv
Create Seurat object
Then in R, we load both the metadata and counts matrix to generate a Seurat object.
# Load counts matrix
counts <- Read10X_h5("data/HumanColonCancer_Flex_Multiplex_count_filtered_feature_bc_matrix.h5")
# Load metadata and set rownames
meta <- read.csv("data/SingleCell_MetaData.csv")
rownames(meta) <- meta$Barcode
# Create Seurat object
seurat <- CreateSeuratObject(counts = counts,
meta.data = meta,
project = "CRC FLEX")The UMAP coordinates are also included within the metadata file. Here, we add the coordinates to the Seurat object within the dimensionality reduction slot of the Seurat object so that we can appropriatley called DimPlot() in future steps.
# Grab UMAP coordinates from metadata and put them in a reduction
umap_coords <- as.matrix(seurat@meta.data[, c("UMAP1", "UMAP2")])
# Create a DimReduc and store it in the object as "umap"
seurat[["umap"]] <- CreateDimReducObject(embeddings = umap_coords,
key = "UMAP_",
assay = DefaultAssay(seurat))The downloaded dataset is the raw output, meaning that no filtration has been done until this point. We are going to use the same filtration that the original creators intended by using the QCFilter column in the metadata. In doing so, we will have cleaned up the dataset to include only high quality cells.
# Remove cells that did not pass QC
seurat <- subset(seurat,
subset = (QCFilter == "Keep"))Even after filtration, this is a very large dataset so the last step is going to be downsampling this reference. The metadata column Level1 contains celltype annotations for the dataset, so we will set those as the Idents() of the object. This is so that when we run subset() and specify downsample = 500, we will be
# Set Idents to Level1 celltype annotation
Idents(seurat) <- "Level1"
# Randomly downsample dataset such that there are
# 500 cells per Level1 identity
seurat_down <- subset(seurat,
downsample = 500)
# Save downsampled Seurat object
saveRDS(seurat_down, "crc_flex_ref_downsample.RDS")This seurat_down is the same object that we will be using in the Deconvolution lesson as the reference dataset.