# Creating Seurat object from SpaceRanger output
# Visium HD spatial transcriptomics workshop
# Author: Harvard Chan Bioinformatics Core
# Created: December 2025
library(tidyverse)
library(Seurat)Loading Spatial Data
Approximate time: XY minutes
Learning Objectives
- Establish a hypothesis or biological question to assess using the provided dataset
- Quantify how bin size affects the count matrix
- Utilize functions to grab important information from each Seurat data structure slots
- Create a Seurat object from SpaceRanger outputs
The Dataset
TODO
Metadata
Human colorectal cancer sample
Matched control and cancerous colon sample from 58 year old female in stage IV-A.
With the associated paper
We expect to see the following celltypes in our dataset:
- B cells
- T cells
- Tumor
- Myeloid
- Fibroblast
- Endothelial
- Intestinal Epithelial
- Smooth Muscle
- Neuronal
A mouse brain (fresh frozen) was obtained from Charles River Laboratories.
Strain: C57BL/6 Sex: Male Age: 8 weeks Sample preparation
A 10 µm section was taken with a cryostat (Epredia CryoStar NX70). Tissue block preparation, sectioning, H&E staining, and imaging followed the Visium HD Fresh Frozen Tissue Preparation Handbook (CG000763).
The data was subset to a smaller cross-section of the brain in order to make it more manageable on local laptops. The instructions for how to subset a 10X Visium HD slides can be found here.
- Given the information that we know from the metadata, what might be some questions that we want to answer using our data?
Downloading the R Project and Data
We have assembled an R Project for you to download that includes the data along with a basic file structure for maintain good data management. It can be quite easy when starting a new analysis to want to dive right in and start analyzing your data. However, good data management practices occur at each step of data’s lifecycle. It is a good habit to begin by creating a basic directory structure to hold your data. Let’s start by download this R Project and data from here. Left-click on the link and select Save Link As.. or Download Linked File As.., then select a place on your computer where you would like to place this R Project.
We can open the R Project up and see that the provided file structure should look like:
If your R Project looks like below, then you are ready to start!
TODO: Insert picture of what the R Project will look when it is ready to start
Loading Data into a Seurat Object
There are several different workflow that can used for analyzing spatial transcriptomics data. While each has their own nuances, they all follow the same fundamental theory and processes:
- Python’s Squidpy
- R’s Spatial Experiment
- R’s Seurat.
As this in an R-based workshop, we will be using Seurat.
Data Files
To load our data, we are going to make use of the function Load10X_Spatial(). The input needed for this function comes from the output generated by SpaceRanger, including both the feature matrix and low-resolution tissue image. Once supplied, a Seurat object that contains our counts matrix and image is created.
? to look up function arguments
If you are ever unsure what parameters you can supply to a function, you make use of the ? call in R. This will bring up the manual page for the function while will provide more details on what each variable is used for.
?Load10X_SpatialThere are 3 main arguments that we will be utilizing for this step:
Load10X_Spatial() arguments
| Argument | Description |
|---|---|
| data.dir | Directory containing the H5 file specified by filename and the image data in a subdirectory called spatial |
| bin.size | Specifies the bin sizes to read in (defaults to c(16, 8)) |
| slice | Name for the stored image of the tissue slice |
Before loading in the data, let us take a look at the files we will supplying to the function. In particular, let us take a look at sample data/P5CRC_cropped which is the path would supply as data.dir in the Load10X_Spatial() function.
data/P5CRC_cropped/
└── binned_outputs
├── square_008um
└── square_016um
In the Visium HD assay, SpaceRanger bins the image into 2µm x 2µm, 8µm x 8µm, and 16µm x 16µm bins. The output from each of these resolution is found in the binned_outputs folder.
This is not an easy question to answer!
The typical recommendation for a Visium HD analysis is to use the 8µm x 8µm resolution of the dataset. However, it is important to consider the dataset you are working with. For example, neuron cells are know to be a larger celltype in size. Therefore it would be good to consider a large bin size in order to fully represent each cell within a bin.
There are numerous algorithms that have been created to identify the boundaries of cells to ensure that counts are being associated with single-cells. This process is typically referred to as segmentation, which we will cover in a later lesson.
Creating the Seurat Object
Now that we understand the input needed for the Load10X_Spatial() function, let’s use it to create a Seurat object called crc. We will specify that we want to load in the 8µm and 16µm bin sizes at the same time.
Typically you would only load in one bin size, but for the purposes of better understanding bins and our Seurat data structure, we will load both here.
crc <- Load10X_Spatial(data.dir = "data/P5CRC_cropped/",
bin.size = c(8, 16),
slice = "P5CRC")We can print out some basic information of our Seurat object to examine its data structure. We will do this frequently throughout the workshop to understand how our the Seurat data structure changes with each step of the workflow.
crcAn object of class Seurat
36170 features across 259572 samples within 2 assays
Active assay: Spatial.008um (18085 features, 0 variable features)
1 layer present: counts
1 other assay present: Spatial.016um
2 spatial fields of view present: P5CRC.008um P5CRC.016um
Anatomy of a Seurat Object
As we can see from our Seurat callout, there are a lot of different slots inside our object. Here, we will go through each of the major components of a Seurat object and how you would access key pieces of information.
Assays
Assays are where we can store different counts matrices - we are not forced to keep the same features and variable genes across assays. Each assay will contain it’s own Layers that can be distinct from other assays in the object. This is useful in several different cases:
- Multi-modal assays, where you can keep the expression matrices for RNA, ATAC, or protein in a single Seurat object
- Storing counts matrices from a variety of different normalization techniques
- Batch integration methods will sometimes generate a transformed counts matrix
Here we can print the different assays that exist within our crc object:
Assays(crc)[1] "Spatial.008um" "Spatial.016um"
We have 2 distinct assays for the different bin sizes, which makes sense because we have different count matrices for our cells based upon the bin size that was selected.
The DefaultAssay() function shows us which assay information will be used in other Seurat function calls, unless explicitly specified otherwise.
DefaultAssay(crc)[1] "Spatial.008um"
We can also change what our default assay is. Let’s set it to the 016um bins:
DefaultAssay(crc) <- "Spatial.016um"
crcAn object of class Seurat
36170 features across 259572 samples within 2 assays
Active assay: Spatial.016um (18085 features, 0 variable features)
1 layer present: counts
1 other assay present: Spatial.008um
2 spatial fields of view present: P5CRC.008um P5CRC.016um
Now we see that the callout says: Active assay: Spatial.016um
Features and Cells
Our count matrices function as any other matrix does, with rows and columns.
In Seurat, the rows correspond to Features. In the case of spatial transcriptomics, our features are genes. In other experiments, features could refer to chromatin peaks or proteins. The important thing to keep in mind is what technology you are using. Since this is a Visium HD dataset, we are quantifying RNA expression (genes).
We can see what the first few genes/features in our count matrix are:
Features(crc) %>% head()[1] "SAMD11" "NOC2L" "KLHL17" "PLEKHN1" "PERM1" "HES4"
As well as see the number of genes that are found in each of our assays:
nrow(crc[["Spatial.008um"]])[1] 18085
nrow(crc[["Spatial.016um"]])[1] 18085
The columns correspond to Cells (or samples as it appears in the callout). We can see what the first few cells in our count matrix are:
Cells(crc) %>% head()[1] "s_016um_00050_00315-1" "s_016um_00204_00145-1" "s_016um_00237_00273-1"
[4] "s_016um_00224_00126-1" "s_016um_00191_00159-1" "s_016um_00064_00214-1"
As well as see the number of cells that are found in each of our assays:
ncol(crc[["Spatial.008um"]])[1] 207393
ncol(crc[["Spatial.016um"]])[1] 52179
- What differences do you see between the
8umand16umbins?
Layers
Layers are our count matrices.
Layers(crc)[1] "counts"
By default, Seurat uses the following naming convention for the counts matrices within an Assay:
Layer() count matrix
| Layer | Description |
|---|---|
| counts | Raw counts |
| data | Normalized counts |
| scale.data | Scaled‑normalized counts |
You may notice that our Seurat object only contains counts right now. This is because we have not run any normalization steps yet (we will discuss how to do so in future lessons).
Using the LayerData() function we can access the entire counts matrix. Furthermore, we can specify the assay if we would prefer to not use the DefaultAssay.
LayerData(crc,
assay = "Spatial.016um",
layer = "count")[1:5, 1:5]| s_016um_00050_00315-1 | s_016um_00204_00145-1 | s_016um_00237_00273-1 | s_016um_00224_00126-1 | s_016um_00191_00159-1 | |
|---|---|---|---|---|---|
| SAMD11 | 0 | 0 | 0 | 0 | 0 |
| NOC2L | 0 | 0 | 0 | 0 | 0 |
| KLHL17 | 0 | 0 | 0 | 0 | 0 |
| PLEKHN1 | 0 | 0 | 0 | 0 | 0 |
| PERM1 | 0 | 0 | 0 | 0 | 0 |
By printing the first 5 features and cells in our object (for easier visualization). We can see that we are working with whole numbers which reinforces the idea that this is the raw data, with no transformations having been applied.
Spatial Fields
We do not just have expression data associated with our sample, we also have the spatial slide that comes with its own set of values and information.
For example we can grab the x,y coordinates of each bin using the GetTissueCoordinates() function.
GetTissueCoordinates(crc) %>% View()Or visualize what our slide looks like with SpatialDimPlot():
SpatialDimPlot(crc)SpatialDimPlot()
Metadata
Seurat automatically creates some metadata for each of the cells when the object is created. This information is stored in the @meta.data slot within the Seurat object. The rownames are automatically set to be the cell names.
crc@meta.data %>% View()@meta.data
What does each column represent?
@meta.data
| Column | Description |
|---|---|
| orig.ident | Sample identity if known; defaults to “s” |
| nCount_RNA | Number of UMIs per cell |
| nFeature_RNA | Number of genes detected per cell |
While it may seem intimidating at first, the important thing to remember is that this is a dataframe. Therefore can modify and work with this dataframe just like we would any other in R! For example, we can set our orig.ident column to be our sample name rather than “s”.
crc@meta.data$orig.ident <- "P5CRC"crc@meta.data %>% View()@meta.data after updating orig.ident
Additionally, we do not have use the @meta.data each time we want to access a single column. We can use the $ follow by the column name as a shorthand.
crc$nCount_Spatial.008um %>% head()s_008um_00078_00444-1 s_008um_00128_00278-1 s_008um_00052_00559-1
65 1300 128
s_008um_00121_00413-1 s_008um_00447_00253-1 s_008um_00455_00433-1
538 201 164
Idents
The cell identities are stored as Idents(), which contain the default way to label cells. For example, if we wanted to label each cell by which sample they came from we could run:
Idents(crc) <- "orig.ident"
Idents(crc) %>% head()s_008um_00078_00444-1 s_008um_00128_00278-1 s_008um_00052_00559-1
P5CRC P5CRC P5CRC
s_008um_00121_00413-1 s_008um_00447_00253-1 s_008um_00455_00433-1
P5CRC P5CRC P5CRC
Levels: P5CRC
Where we see that the identities of the cells are the value stored in the @meta.data column orig.ident.
Loading Multiple Samples into Seurat
Before we begin, we are going to remove the crc object we have been playing with in order to clear up extra space on our computers. Spatial transcriptomics take up a lot of memory! So we are being careful to remove any variables that might overload the RAM.
rm(crc)Now that we have a better understanding of how to create a Seurat object from SpaceRanger outpus, let’s go through how we can load multiple samples at once. In this case we want to represent both samples P5CRC and P5NAT in a singular Seurat object.
Using a for Loop
In practice, you will likely have several samples that you will need to read in data for, and that can get tedious and error-prone if you do it one at a time. So, to make the data import into R more efficient we can use a for loop, which will iterate over a series of commands for each of the inputs given and create Seurat objects for each of our samples.
for loop syntax
In R, the for loop has the following structure/syntax:
## DO NOT RUN
for (variable in input){
command1
command2
command3
}Today we will use it to iterate over the two sample folders and execute commands for each sample as we did above for a single sample:
- Create the Seurat objects from the SpaceRanger data (
Load10X_Spatial()) and specifysliceso that images are labeled with their sample name - Set
orig.identto be our sample
Once those steps run, we will store the newly generated Seurat object to a list called list_seurat so that we can eventually merge both samples together.
We will be using a bin size of 16um for the remainder of this workshop!
samples <- c("P5CRC", "P5NAT")
list_seurat <- list()
for (sample in samples) {
# Path to data directory
data_dir <- paste0("data/", sample, "_cropped")
# Create seurat object and set orig ident to be sample
seurat <- Load10X_Spatial(data.dir = data_dir,
bin.size = c(16),
slice = sample)
seurat$orig.ident <- sample
# Store seurat object in our list
list_seurat[[sample]] <- seurat
}To confirm that we succesfully loaded both samples in, we can take a look at the contents of list_seurat. We should see that there are two Seurat objects in our list that correspond to each sample.
list_seurat$P5CRC
An object of class Seurat
18085 features across 52179 samples within 1 assay
Active assay: Spatial.016um (18085 features, 0 variable features)
1 layer present: counts
1 spatial field of view present: P5CRC.016um
$P5NAT
An object of class Seurat
18085 features across 25397 samples within 1 assay
Active assay: Spatial.016um (18085 features, 0 variable features)
1 layer present: counts
1 spatial field of view present: P5NAT.016um
This is exactly what we had hoped to see! So now we can move on to the next step which is merging the sample together into a singular Seurat object.
Merge Datasets Together
We merge samples together because it make it easier to run the QC steps for both sample groups together. It also enables us to easily compare the data quality for all cells at one time
To create this combined Seurat object, we use the merge() function. Because the same cell IDs can be used for different samples, we add a sample-specific prefix to each of our cell IDs using the add.cell.id argument to ensure the cell names are unique.
seurat_merged <- merge(x = list_seurat[["P5CRC"]],
y = list_seurat[["P5NAT"]],
add.cell.id = c("P5CRC", "P5NAT"))
seurat_mergedAn object of class Seurat
18085 features across 77576 samples within 1 assay
Active assay: Spatial.016um (18085 features, 0 variable features)
2 layers present: counts.1, counts.2
2 spatial fields of view present: P5CRC.016um P5NAT.016um
From the callout we can see that we now have: 2 layers present: counts.1, counts.2
Seurat has functionality to merge many samples together. You can do this quite easily by adding all sample objects to the y argument in a vector format. An example is provided below, assuming you use the list structured we used previously:
## DO NOT RUN
merged_seurat <- merge(x = seurat_list[[1]],
y = seurat_list[[2:length(seurat_list)]],
add.cell.id = names(seurat_list))However, we do not want to have two distinct raw counts matrices (counts.1 and counts.2). We want the counts for each sample to be concatenated together. This would be a single, large count matrix that represent each cell in the dataset. Seurat’s function JoinLayers() allows us to do just this.
seurat_merged <- JoinLayers(seurat_merged)- What function can we use to double check that we have a singular
countslayer?
A common point of confusion is the distinction between integration and merging. In the field, integration is considered to be modifying either your counts or latent space in a way to correct for a batch variable. Whereas what we are doing now is merging or concatenating multiple samples together. This process of merging does not transform the values in the count matrices.
Evaluating merged_seurat
Let’s also double check that we have the correct number of cells. First, let us see what the number of cells was for each sample:
ncol(list_seurat[["P5CRC"]]) + ncol(list_seurat[["P5NAT"]])[1] 77576
Which is the same as the number of samples in our Seurat callout!
seurat_mergedAn object of class Seurat
18085 features across 77576 samples within 1 assay
Active assay: Spatial.016um (18085 features, 0 variable features)
1 layer present: counts
2 spatial fields of view present: P5CRC.016um P5NAT.016um
If we look at the metadata of the merged object we should be able to see the prefixes in the rownames (Cells) as well as the updated orig.ident we set in the for loop earlier.
# Check that the merged object has the appropriate sample-specific prefixes
seurat_merged@meta.data %>% head()@meta.data
seurat_merged@meta.data %>% tail()@meta.data
Lastly we want to make sure that the Idents of our cells is a useful piece of information, for example like orig.ident, which contains our sample IDs.
Idents(seurat_merged) <- "orig.ident"Save!
Now is a great spot to save our seurat_merged object as the next step is going to be filtering.
# Save integrated Seurat object
saveRDS(seurat_merged, "data/seurat_merged.RDS")A good rule of thumb is to save your intermediate objects before doing any filtration or after running a computational step that takes a long time to run.