Loading Spatial Data

Author

Will Gammerdinger, Noor Sohail, Zhu Zhuo, James Billingsley, Shannan Ho Sui

Published

July 22, 2025

Approximate time: XY minutes

Learning Objectives

Establish a hypothesis or biological question to assess using the provided dataset
Quantify how bin size affects the count matrix
Utilize functions to grab important information from each Seurat data structure slots
Create a Seurat object from SpaceRanger outputs

The Dataset

TODO

Human colorectal cancer sample

Matched control and cancerous colon sample from 58 year old female in stage IV-A.

With the associated paper

We expect to see the following celltypes in our dataset:

B cells
T cells
Tumor
Myeloid
Fibroblast
Endothelial
Intestinal Epithelial
Smooth Muscle
Neuronal

A mouse brain (fresh frozen) was obtained from Charles River Laboratories.

Strain: C57BL/6 Sex: Male Age: 8 weeks Sample preparation

A 10 µm section was taken with a cryostat (Epredia CryoStar NX70). Tissue block preparation, sectioning, H&E staining, and imaging followed the Visium HD Fresh Frozen Tissue Preparation Handbook (CG000763).

The data was subset to a smaller cross-section of the brain in order to make it more manageable on local laptops. The instructions for how to subset a 10X Visium HD slides can be found here.

Exercise 1

Given the information that we know from the metadata, what might be some questions that we want to answer using our data?

Downloading the R Project and Data

We have assembled an R Project for you to download that includes the data along with a basic file structure for maintain good data management. It can be quite easy when starting a new analysis to want to dive right in and start analyzing your data. However, good data management practices occur at each step of data’s lifecycle. It is a good habit to begin by creating a basic directory structure to hold your data. Let’s start by download this R Project and data from here. Left-click on the link and select Save Link As.. or Download Linked File As.., then select a place on your computer where you would like to place this R Project.

We can open the R Project up and see that the provided file structure should look like:

If your R Project looks like below, then you are ready to start!

TODO: Insert picture of what the R Project will look when it is ready to start

# Creating Seurat object from SpaceRanger output
# Visium HD spatial transcriptomics workshop
# Author: Harvard Chan Bioinformatics Core
# Created: December 2025

library(tidyverse)
library(Seurat)

Loading Data into a Seurat Object

There are several different workflow that can used for analyzing spatial transcriptomics data. While each has their own nuances, they all follow the same fundamental theory and processes:

Python’s Squidpy
R’s Spatial Experiment
R’s Seurat.

As this in an R-based workshop, we will be using Seurat.

Data Files

To load our data, we are going to make use of the function Load10X_Spatial(). The input needed for this function comes from the output generated by SpaceRanger, including both the feature matrix and low-resolution tissue image. Once supplied, a Seurat object that contains our counts matrix and image is created.

Using ? to look up function arguments

If you are ever unsure what parameters you can supply to a function, you make use of the ? call in R. This will bring up the manual page for the function while will provide more details on what each variable is used for.

?Load10X_Spatial

There are 3 main arguments that we will be utilizing for this step:

Table 1: Load10X_Spatial() arguments

Argument	Description
data.dir	Directory containing the H5 file specified by filename and the image data in a subdirectory called spatial
bin.size	Specifies the bin sizes to read in (defaults to c(16, 8))
slice	Name for the stored image of the tissue slice

Before loading in the data, let us take a look at the files we will supplying to the function. In particular, let us take a look at sample data/P5CRC_cropped which is the path would supply as data.dir in the Load10X_Spatial() function.

data/P5CRC_cropped/
└── binned_outputs
    ├── square_008um
    └── square_016um

In the Visium HD assay, SpaceRanger bins the image into 2µm x 2µm, 8µm x 8µm, and 16µm x 16µm bins. The output from each of these resolution is found in the binned_outputs folder.

Choosing your bin size

This is not an easy question to answer!

The typical recommendation for a Visium HD analysis is to use the 8µm x 8µm resolution of the dataset. However, it is important to consider the dataset you are working with. For example, neuron cells are know to be a larger celltype in size. Therefore it would be good to consider a large bin size in order to fully represent each cell within a bin.

There are numerous algorithms that have been created to identify the boundaries of cells to ensure that counts are being associated with single-cells. This process is typically referred to as segmentation, which we will cover in a later lesson.

Creating the Seurat Object

Now that we understand the input needed for the Load10X_Spatial() function, let’s use it to create a Seurat object called crc. We will specify that we want to load in the 8µm and 16µm bin sizes at the same time.

Typically you would only load in one bin size, but for the purposes of better understanding bins and our Seurat data structure, we will load both here.

crc <- Load10X_Spatial(data.dir = "data/P5CRC_cropped/",
                       bin.size = c(8, 16),
                       slice = "P5CRC")

We can print out some basic information of our Seurat object to examine its data structure. We will do this frequently throughout the workshop to understand how our the Seurat data structure changes with each step of the workflow.

crc

An object of class Seurat 
36170 features across 259572 samples within 2 assays 
Active assay: Spatial.008um (18085 features, 0 variable features)
 1 layer present: counts
 1 other assay present: Spatial.016um
 2 spatial fields of view present: P5CRC.008um P5CRC.016um

Anatomy of a Seurat Object

As we can see from our Seurat callout, there are a lot of different slots inside our object. Here, we will go through each of the major components of a Seurat object and how you would access key pieces of information.

Assays

Assays are where we can store different counts matrices - we are not forced to keep the same features and variable genes across assays. Each assay will contain it’s own Layers that can be distinct from other assays in the object. This is useful in several different cases:

Multi-modal assays, where you can keep the expression matrices for RNA, ATAC, or protein in a single Seurat object
Storing counts matrices from a variety of different normalization techniques
Batch integration methods will sometimes generate a transformed counts matrix

Here we can print the different assays that exist within our crc object:

Assays(crc)

[1] "Spatial.008um" "Spatial.016um"

We have 2 distinct assays for the different bin sizes, which makes sense because we have different count matrices for our cells based upon the bin size that was selected.

The DefaultAssay() function shows us which assay information will be used in other Seurat function calls, unless explicitly specified otherwise.

DefaultAssay(crc)

[1] "Spatial.008um"

We can also change what our default assay is. Let’s set it to the 016um bins:

DefaultAssay(crc) <- "Spatial.016um"
crc

An object of class Seurat 
36170 features across 259572 samples within 2 assays 
Active assay: Spatial.016um (18085 features, 0 variable features)
 1 layer present: counts
 1 other assay present: Spatial.008um
 2 spatial fields of view present: P5CRC.008um P5CRC.016um

Now we see that the callout says: Active assay: Spatial.016um

Features and Cells

Our count matrices function as any other matrix does, with rows and columns.

In Seurat, the rows correspond to Features. In the case of spatial transcriptomics, our features are genes. In other experiments, features could refer to chromatin peaks or proteins. The important thing to keep in mind is what technology you are using. Since this is a Visium HD dataset, we are quantifying RNA expression (genes).

We can see what the first few genes/features in our count matrix are:

Features(crc) %>% head()

[1] "SAMD11"  "NOC2L"   "KLHL17"  "PLEKHN1" "PERM1"   "HES4"

As well as see the number of genes that are found in each of our assays:

nrow(crc[["Spatial.008um"]])

[1] 18085

nrow(crc[["Spatial.016um"]])

[1] 18085

The columns correspond to Cells (or samples as it appears in the callout). We can see what the first few cells in our count matrix are:

Cells(crc) %>% head()

[1] "s_016um_00050_00315-1" "s_016um_00204_00145-1" "s_016um_00237_00273-1"
[4] "s_016um_00224_00126-1" "s_016um_00191_00159-1" "s_016um_00064_00214-1"

As well as see the number of cells that are found in each of our assays:

ncol(crc[["Spatial.008um"]])

[1] 207393

ncol(crc[["Spatial.016um"]])

[1] 52179

Exercise 2

What differences do you see between the 8um and 16um bins?

Layers

Layers are our count matrices.

Layers(crc)

[1] "counts"

By default, Seurat uses the following naming convention for the counts matrices within an Assay:

Table 2: Description of each Layer() count matrix

Layer	Description
counts	Raw counts
data	Normalized counts
scale.data	Scaled‑normalized counts

You may notice that our Seurat object only contains counts right now. This is because we have not run any normalization steps yet (we will discuss how to do so in future lessons).

Using the LayerData() function we can access the entire counts matrix. Furthermore, we can specify the assay if we would prefer to not use the DefaultAssay.

LayerData(crc, 
          assay = "Spatial.016um", 
          layer = "count")[1:5, 1:5]

Table 3: First 5 rows and columns of raw counts matrix

	s_016um_00050_00315-1	s_016um_00204_00145-1	s_016um_00237_00273-1	s_016um_00224_00126-1	s_016um_00191_00159-1
SAMD11	0	0	0	0	0
NOC2L	0	0	0	0	0
KLHL17	0	0	0	0	0
PLEKHN1	0	0	0	0	0
PERM1	0	0	0	0	0

By printing the first 5 features and cells in our object (for easier visualization). We can see that we are working with whole numbers which reinforces the idea that this is the raw data, with no transformations having been applied.

Spatial Fields

We do not just have expression data associated with our sample, we also have the spatial slide that comes with its own set of values and information.

For example we can grab the x,y coordinates of each bin using the GetTissueCoordinates() function.

GetTissueCoordinates(crc) %>% View()

Table 4: Coordinates on spatial slide for each cell

Or visualize what our slide looks like with SpatialDimPlot():

SpatialDimPlot(crc)

Figure 1: Spatial visualization of spots with `SpatialDimPlot()`

Metadata

Seurat automatically creates some metadata for each of the cells when the object is created. This information is stored in the @meta.data slot within the Seurat object. The rownames are automatically set to be the cell names.

crc@meta.data %>% View()

Table 5: Seurat default @meta.data

What does each column represent?

Table 6: Columns automatically populated in @meta.data

Column	Description
orig.ident	Sample identity if known; defaults to “s”
nCount_RNA	Number of UMIs per cell
nFeature_RNA	Number of genes detected per cell

While it may seem intimidating at first, the important thing to remember is that this is a dataframe. Therefore can modify and work with this dataframe just like we would any other in R! For example, we can set our orig.ident column to be our sample name rather than “s”.

crc@meta.data$orig.ident <- "P5CRC"

crc@meta.data %>% View()

Table 7: Seurat default @meta.data after updating orig.ident

Additionally, we do not have use the @meta.data each time we want to access a single column. We can use the $ follow by the column name as a shorthand.

crc$nCount_Spatial.008um %>% head()

s_008um_00078_00444-1 s_008um_00128_00278-1 s_008um_00052_00559-1 
                   65                  1300                   128 
s_008um_00121_00413-1 s_008um_00447_00253-1 s_008um_00455_00433-1 
                  538                   201                   164

Idents

The cell identities are stored as Idents(), which contain the default way to label cells. For example, if we wanted to label each cell by which sample they came from we could run:

Idents(crc) <- "orig.ident"
Idents(crc) %>% head()

s_008um_00078_00444-1 s_008um_00128_00278-1 s_008um_00052_00559-1 
                P5CRC                 P5CRC                 P5CRC 
s_008um_00121_00413-1 s_008um_00447_00253-1 s_008um_00455_00433-1 
                P5CRC                 P5CRC                 P5CRC 
Levels: P5CRC

Where we see that the identities of the cells are the value stored in the @meta.data column orig.ident.

Loading Multiple Samples into Seurat

Before we begin, we are going to remove the crc object we have been playing with in order to clear up extra space on our computers. Spatial transcriptomics take up a lot of memory! So we are being careful to remove any variables that might overload the RAM.

rm(crc)

Now that we have a better understanding of how to create a Seurat object from SpaceRanger outpus, let’s go through how we can load multiple samples at once. In this case we want to represent both samples P5CRC and P5NAT in a singular Seurat object.

Using a `for` Loop

In practice, you will likely have several samples that you will need to read in data for, and that can get tedious and error-prone if you do it one at a time. So, to make the data import into R more efficient we can use a for loop, which will iterate over a series of commands for each of the inputs given and create Seurat objects for each of our samples.

for loop syntax

In R, the for loop has the following structure/syntax:

## DO NOT RUN

for (variable in input){
    command1
    command2
    command3
}

Today we will use it to iterate over the two sample folders and execute commands for each sample as we did above for a single sample:

Create the Seurat objects from the SpaceRanger data (Load10X_Spatial()) and specify slice so that images are labeled with their sample name
Set orig.ident to be our sample

Once those steps run, we will store the newly generated Seurat object to a list called list_seurat so that we can eventually merge both samples together.

We will be using a bin size of 16um for the remainder of this workshop!

samples <- c("P5CRC", "P5NAT")
list_seurat <- list()

for (sample in samples) {
  
  # Path to data directory
  data_dir <- paste0("data/", sample, "_cropped")
  # Create seurat object and set orig ident to be sample
  seurat <- Load10X_Spatial(data.dir = data_dir,
                            bin.size = c(16),
                            slice = sample)
  seurat$orig.ident <- sample
  
  # Store seurat object in our list
  list_seurat[[sample]] <- seurat
}

To confirm that we succesfully loaded both samples in, we can take a look at the contents of list_seurat. We should see that there are two Seurat objects in our list that correspond to each sample.

list_seurat

$P5CRC
An object of class Seurat 
18085 features across 52179 samples within 1 assay 
Active assay: Spatial.016um (18085 features, 0 variable features)
 1 layer present: counts
 1 spatial field of view present: P5CRC.016um

$P5NAT
An object of class Seurat 
18085 features across 25397 samples within 1 assay 
Active assay: Spatial.016um (18085 features, 0 variable features)
 1 layer present: counts
 1 spatial field of view present: P5NAT.016um

This is exactly what we had hoped to see! So now we can move on to the next step which is merging the sample together into a singular Seurat object.

Merge Datasets Together

We merge samples together because it make it easier to run the QC steps for both sample groups together. It also enables us to easily compare the data quality for all cells at one time

To create this combined Seurat object, we use the merge() function. Because the same cell IDs can be used for different samples, we add a sample-specific prefix to each of our cell IDs using the add.cell.id argument to ensure the cell names are unique.

seurat_merged <- merge(x = list_seurat[["P5CRC"]],
                       y = list_seurat[["P5NAT"]],
                       add.cell.id = c("P5CRC", "P5NAT"))
seurat_merged

An object of class Seurat 
18085 features across 77576 samples within 1 assay 
Active assay: Spatial.016um (18085 features, 0 variable features)
 2 layers present: counts.1, counts.2
 2 spatial fields of view present: P5CRC.016um P5NAT.016um

From the callout we can see that we now have: 2 layers present: counts.1, counts.2

What if I am merging more than two samples?

Seurat has functionality to merge many samples together. You can do this quite easily by adding all sample objects to the y argument in a vector format. An example is provided below, assuming you use the list structured we used previously:

## DO NOT RUN
merged_seurat <- merge(x = seurat_list[[1]], 
                       y = seurat_list[[2:length(seurat_list)]],
                       add.cell.id = names(seurat_list))

However, we do not want to have two distinct raw counts matrices (counts.1 and counts.2). We want the counts for each sample to be concatenated together. This would be a single, large count matrix that represent each cell in the dataset. Seurat’s function JoinLayers() allows us to do just this.

seurat_merged <- JoinLayers(seurat_merged)

Exercise 3

What function can we use to double check that we have a singular counts layer?

Merging vs. Integration

A common point of confusion is the distinction between integration and merging. In the field, integration is considered to be modifying either your counts or latent space in a way to correct for a batch variable. Whereas what we are doing now is merging or concatenating multiple samples together. This process of merging does not transform the values in the count matrices.

Evaluating `merged_seurat`

Let’s also double check that we have the correct number of cells. First, let us see what the number of cells was for each sample:

ncol(list_seurat[["P5CRC"]]) + ncol(list_seurat[["P5NAT"]])

[1] 77576

Which is the same as the number of samples in our Seurat callout!

seurat_merged

An object of class Seurat 
18085 features across 77576 samples within 1 assay 
Active assay: Spatial.016um (18085 features, 0 variable features)
 1 layer present: counts
 2 spatial fields of view present: P5CRC.016um P5NAT.016um

If we look at the metadata of the merged object we should be able to see the prefixes in the rownames (Cells) as well as the updated orig.ident we set in the for loop earlier.

# Check that the merged object has the appropriate sample-specific prefixes
seurat_merged@meta.data %>% head()

Table 8: First 5 rows of @meta.data

seurat_merged@meta.data %>% tail()

Table 9: Last 5 rows of @meta.data

Lastly we want to make sure that the Idents of our cells is a useful piece of information, for example like orig.ident, which contains our sample IDs.

Idents(seurat_merged) <- "orig.ident"

Save!

Now is a great spot to save our seurat_merged object as the next step is going to be filtering.

# Save integrated Seurat object
saveRDS(seurat_merged, "data/seurat_merged.RDS")

A good rule of thumb is to save your intermediate objects before doing any filtration or after running a computational step that takes a long time to run.

--- title: "Loading Spatial Data" authors: "Will Gammerdinger, Noor Sohail, Zhu Zhuo, James Billingsley, Shannan Ho Sui" date: "Tuesday, July 22, 2025" editor_options: markdown: wrap: 72 --- Approximate time: XY minutes # Learning Objectives - Establish a hypothesis or biological question to assess using the provided dataset - Quantify how bin size affects the count matrix - Utilize functions to grab important information from each Seurat data structure slots - Create a Seurat object from SpaceRanger outputs # The Dataset TODO ## Metadata ::: {.panel-tabset} ### Human CRC Human colorectal cancer sample Matched control and cancerous colon sample from 58 year old female in stage IV-A. - [Natural (control)](https://www.10xgenomics.com/platforms/visium/product-family/dataset-human-crc) - [Colorectal Cancer](https://www.10xgenomics.com/platforms/visium/product-family/dataset-human-crc) With the associated [paper](https://www.nature.com/articles/s41588-025-02193-3) We expect to see the following celltypes in our dataset: - B cells - T cells - Tumor - Myeloid - Fibroblast - Endothelial - Intestinal Epithelial - Smooth Muscle - Neuronal ### Mouse Brain A mouse brain (fresh frozen) was obtained from Charles River Laboratories. Strain: C57BL/6 Sex: Male Age: 8 weeks Sample preparation A 10 µm section was taken with a cryostat (Epredia CryoStar NX70). Tissue block preparation, sectioning, H&E staining, and imaging followed the Visium HD Fresh Frozen Tissue Preparation Handbook (CG000763). The data was subset to a smaller cross-section of the brain in order to make it more manageable on local laptops. The instructions for how to subset a 10X Visium HD slides can be found [here](Aside_subsetting_slide.qmd). ::: ::: callout-tip # [**Exercise 1**](02_loading-spatial-data_Answer-key.qmd#exercise-1) 1. Given the information that we know from the metadata, what might be some questions that we want to answer using our data? ::: ## Downloading the R Project and Data We have assembled an R Project for you to download that includes the data along with a basic file structure for maintain good data management. It can be quite easy when starting a new analysis to want to dive right in and start analyzing your data. However, good data management practices occur at each step of data's lifecycle. It is a good habit to begin by creating a basic directory structure to hold your data. Let's start by download this R Project and data from [here](). Left-click on the link and select `Save Link As..` or `Download Linked File As..`, then select a place on your computer where you would like to place this R Project. We can open the R Project up and see that the provided file structure should look like: ```{r} #| label: file_structure #| echo: false library(fs) # Print the first level in the downloaded directory # dir_tree("data", recurse = 1) ``` If your R Project looks like below, then you are ready to start! TODO: **Insert picture of what the R Project will look when it is ready to start** ```{r} #| label: header_libraries # Creating Seurat object from SpaceRanger output # Visium HD spatial transcriptomics workshop # Author: Harvard Chan Bioinformatics Core # Created: December 2025 library(tidyverse) library(Seurat) ``` # Loading Data into a Seurat Object There are several different workflow that can used for analyzing spatial transcriptomics data. While each has their own nuances, they all follow the same fundamental theory and processes: - Python's [Squidpy](https://squidpy.readthedocs.io/en/stable/) - R's [Spatial Experiment](https://bioconductor.org/packages/release/bioc/html/SpatialExperiment.html) - R's [Seurat](https://satijalab.org/seurat/). **As this in an R-based workshop, we will be using Seurat.** ## Data Files To load our data, we are going to make use of the function `Load10X_Spatial()`. The input needed for this function comes from the output generated by SpaceRanger, including both the feature matrix and low-resolution tissue image. Once supplied, a Seurat object that contains our counts matrix and image is created. ::: {.callout-note collapse=true} # Using `?` to look up function arguments If you are ever unsure what parameters you can supply to a function, you make use of the `?` call in R. This will bring up the manual page for the function while will provide more details on what each variable is used for. ```{r} #| label: function_man_page #| eval: false ?Load10X_Spatial ``` ::: There are 3 main arguments that we will be utilizing for this step: | Argument | Description | |---------|-------------| | data.dir | Directory containing the H5 file specified by filename and the image data in a subdirectory called spatial | | bin.size | Specifies the bin sizes to read in (defaults to c(16, 8)) | | slice | Name for the stored image of the tissue slice | : `Load10X_Spatial()` arguments {#tbl-Load10X_Spatial_arguments} Before loading in the data, let us take a look at the files we will supplying to the function. In particular, let us take a look at sample `data/P5CRC_cropped` which is the path would supply as `data.dir` in the `Load10X_Spatial()` function. ```{r} #| label: fig-spaceranger_outs #| fig-cap: File structure of SpaceRanger output #| echo: false dir_tree("data/P5CRC_cropped/", recurse = 1) ``` In the Visium HD assay, SpaceRanger bins the image into 2µm x 2µm, 8µm x 8µm, and 16µm x 16µm bins. **The output from each of these resolution is found in the `binned_outputs` folder.** ::: callout-important # Choosing your bin size This is not an easy question to answer! The typical recommendation for a Visium HD analysis is to use the 8µm x 8µm resolution of the dataset. However, it is important to consider the dataset you are working with. For example, neuron cells are know to be a larger celltype in size. Therefore it would be good to consider a large bin size in order to fully represent each cell within a bin. There are numerous algorithms that have been created to identify the boundaries of cells to ensure that counts are being associated with single-cells. This process is typically referred to as segmentation, which we will cover in a later lesson. ::: ## Creating the Seurat Object Now that we understand the input needed for the `Load10X_Spatial()` function, let's use it to create a Seurat object called `crc`. We will specify that we want to load in the `8µm` and `16µm` bin sizes at the same time. **Typically you would only load in one bin size**, but for the purposes of better understanding bins and our Seurat data structure, we will load both here. ```{r} #| label: crc_Load10X_Spatial crc <- Load10X_Spatial(data.dir = "data/P5CRC_cropped/", bin.size = c(8, 16), slice = "P5CRC") ``` We can print out some basic information of our Seurat object to examine its data structure. We will do this frequently throughout the workshop to understand how our the Seurat data structure changes with each step of the workflow. ```{r} #| label: crc_callout crc ``` ## Anatomy of a Seurat Object As we can see from our Seurat callout, there are a lot of different slots inside our object. Here, we will go through each of the major components of a Seurat object and how you would access key pieces of information. ### Assays `Assays` are where we can store different counts matrices - we are not forced to keep the same features and variable genes across assays. Each assay will contain it's own `Layers` that can be distinct from other assays in the object. This is useful in several different cases: - Multi-modal assays, where you can keep the expression matrices for RNA, ATAC, or protein in a single Seurat object - Storing counts matrices from a variety of different normalization techniques - Batch integration methods will sometimes generate a transformed counts matrix Here we can print the different assays that exist within our `crc` object: ```{r} #| label: crc_assays Assays(crc) ``` We have 2 distinct assays for the different bin sizes, which makes sense because we have different count matrices for our cells based upon the bin size that was selected. The `DefaultAssay()` function shows us which assay information will be used in other Seurat function calls, unless explicitly specified otherwise. ```{r} #| label: crc_DefaultAssay DefaultAssay(crc) ``` We can also change what our default assay is. Let's set it to the `016um` bins: ```{r} #| label: crc_DefaultAssay_callout DefaultAssay(crc) <- "Spatial.016um" crc ``` Now we see that the callout says: `Active assay: Spatial.016um ` ### Features and Cells Our count matrices function as any other matrix does, with rows and columns. In Seurat, the **rows** correspond to `Features`. In the case of spatial transcriptomics, our features are genes. In other experiments, features could refer to chromatin peaks or proteins. The important thing to keep in mind is what technology you are using. Since this is a Visium HD dataset, we are quantifying RNA expression (genes). We can see what the first few genes/features in our count matrix are: ```{r} #| label: crc_features Features(crc) %>% head() ``` As well as see the number of genes that are found in each of our assays: ```{r} #| label: crc_nrow nrow(crc[["Spatial.008um"]]) nrow(crc[["Spatial.016um"]]) ``` The **columns** correspond to `Cells` (or samples as it appears in the callout). We can see what the first few cells in our count matrix are: ```{r} #| label: crc_cells Cells(crc) %>% head() ``` As well as see the number of cells that are found in each of our assays: ```{r} #| label: crc_ncol ncol(crc[["Spatial.008um"]]) ncol(crc[["Spatial.016um"]]) ``` ::: callout-tip # [**Exercise 2**](02_loading-spatial-data_Answer-key.qmd#exercise-2) 2. What differences do you see between the `8um` and `16um` bins? ::: ### Layers Layers are our count matrices. ```{r} #| label: crc_layers Layers(crc) ``` By default, Seurat uses the following naming convention for the counts matrices within an `Assay`: | Layer | Description | |-------------|---------------------------------| | counts | Raw counts | | data | Normalized counts | | scale.data | Scaled‑normalized counts | : Description of each `Layer()` count matrix {#tbl-layer_descriptions} You may notice that our Seurat object only contains `counts` right now. This is because we have not run any normalization steps yet (we will discuss how to do so in future lessons). Using the `LayerData()` function we can access the entire counts matrix. Furthermore, we can specify the `assay` if we would prefer to not use the `DefaultAssay`. ```{r} #| label: crc_LayerData #| eval: false LayerData(crc, assay = "Spatial.016um", layer = "count")[1:5, 1:5] ``` ```{r} #| label: tbl-crc_LayerData_kable #| tbl-cap: First 5 rows and columns of raw counts matrix #| echo: false LayerData(crc, assay = "Spatial.016um", layer = "count")[1:5, 1:5] %>% as.data.frame() %>% knitr::kable() ``` By printing the first 5 features and cells in our object (for easier visualization). We can see that we are working with **whole numbers** which reinforces the idea that this is the **raw data**, with no transformations having been applied. ### Spatial Fields We do not just have expression data associated with our sample, we also have the spatial slide that comes with its own set of values and information. For example we can grab the x,y coordinates of each bin using the `GetTissueCoordinates()` function. ```{r} #| label: crc_GetTissueCoordinates #| eval: false GetTissueCoordinates(crc) %>% View() ``` ```{r} #| label: tbl-crc_GetTissueCoordinates_dt #| tbl-cap: Coordinates on spatial slide for each cell #| echo: false GetTissueCoordinates(crc) %>% head(5) %>% DT::datatable() ``` Or visualize what our slide looks like with `SpatialDimPlot()`: ```{r} #| label: fig-crc_SpatialDimPlot #| fig-cap: Spatial visualization of spots with `SpatialDimPlot()` SpatialDimPlot(crc) ``` ### Metadata **Seurat automatically creates some metadata** for each of the cells when the object is created. This information is stored in the `@meta.data` slot within the Seurat object. The rownames are automatically set to be the cell names. ```{r} #| label: crc_meta_data_1 #| eval: false crc@meta.data %>% View() ``` ```{r} #| label: tbl-crc_meta_data_1_dt #| tbl-cap: Seurat default `@meta.data` #| echo: false crc@meta.data %>% head(5) %>% DT::datatable() ``` What does each column represent? | Column | Description | |---------------|-----------------------------------------------| | orig.ident | Sample identity if known; defaults to “s” | | nCount_RNA | Number of UMIs per cell | | nFeature_RNA | Number of genes detected per cell | : Columns automatically populated in `@meta.data` {#tbl-metadata_cols} While it may seem intimidating at first, the important thing to remember is that this is a dataframe. Therefore can modify and work with this dataframe just like we would any other in R! For example, we can set our `orig.ident` column to be our sample name rather than "s". ```{r} #| label: crc_meta_data_orig_ident crc@meta.data$orig.ident <- "P5CRC" ``` ```{r} #| label: crc_meta_data_2 #| eval: false crc@meta.data %>% View() ``` ```{r} #| label: tbl-crc_meta_data_2_dt #| tbl-cap: Seurat default `@meta.data` after updating `orig.ident` #| echo: false crc@meta.data %>% head(5) %>% DT::datatable() ``` Additionally, we do not have use the `@meta.data` each time we want to access a single column. We can use the `$` follow by the column name as a shorthand. ```{r} #| label: crc_dollar_notation crc$nCount_Spatial.008um %>% head() ``` ### Idents The cell identities are stored as `Idents()`, which contain the default way to label cells. For example, if we wanted to label each cell by which sample they came from we could run: ```{r} #| label: crc_idents Idents(crc) <- "orig.ident" Idents(crc) %>% head() ``` Where we see that the identities of the cells are the value stored in the `@meta.data` column `orig.ident`. # Loading Multiple Samples into Seurat Before we begin, we are going to remove the `crc` object we have been playing with in order to clear up extra space on our computers. Spatial transcriptomics take up a lot of memory! So we are being careful to remove any variables that might overload the RAM. ```{r} rm(crc) ``` Now that we have a better understanding of how to create a Seurat object from SpaceRanger outpus, let's go through how we can load multiple samples at once. In this case we want to represent both samples `P5CRC` and `P5NAT` in a singular Seurat object. ## Using a `for` Loop In practice, you will likely have several samples that you will need to read in data for, and that can get tedious and error-prone if you do it one at a time. So, to make the data import into R more efficient we can use a `for` loop, which will iterate over a series of commands for each of the inputs given and create Seurat objects for each of our samples. ::: {.callout-note collapse=true} # `for` loop syntax In R, the `for` loop has the following structure/syntax: ```{r} #| label: for_loop_example #| eval: false ## DO NOT RUN for (variable in input){ command1 command2 command3 } ``` ::: Today we will use it to **iterate over the two sample folders** and execute commands for each sample as we did above for a single sample: 1. Create the Seurat objects from the SpaceRanger data (`Load10X_Spatial()`) and specify `slice` so that images are labeled with their sample name 2. Set `orig.ident` to be our sample Once those steps run, we will store the newly generated Seurat object to a list called `list_seurat` so that we can eventually merge both samples together. **We will be using a bin size of 16um for the remainder of this workshop!** ```{r} #| label: for_loop_Load10X_Spatial #| eval: false samples <- c("P5CRC", "P5NAT") list_seurat <- list() for (sample in samples) { # Path to data directory data_dir <- paste0("data/", sample, "_cropped") # Create seurat object and set orig ident to be sample seurat <- Load10X_Spatial(data.dir = data_dir, bin.size = c(16), slice = sample) seurat$orig.ident <- sample # Store seurat object in our list list_seurat[[sample]] <- seurat } ``` To confirm that we succesfully loaded both samples in, we can take a look at the contents of `list_seurat`. We should see that there are two Seurat objects in our list that correspond to each sample. ```{r} #| label: list_seurat_load #| echo: false # saveRDS(list_seurat, "data/list_seurat.RDS") list_seurat <- readRDS("data/list_seurat.RDS") ``` ```{r} #| label: list_seurat_print list_seurat ``` This is exactly what we had hoped to see! So now we can move on to the next step which is merging the sample together into a singular Seurat object. ## Merge Datasets Together We merge samples together because it make it easier to run the QC steps for both sample groups together. It also enables us to easily compare the data quality for all cells at one time To create this combined Seurat object, we use the `merge()` function. Because the same cell IDs can be used for different samples, we add a **sample-specific prefix** to each of our cell IDs using the `add.cell.id` argument to ensure the cell names are unique. ```{r} #| label: list_seurat_merge seurat_merged <- merge(x = list_seurat[["P5CRC"]], y = list_seurat[["P5NAT"]], add.cell.id = c("P5CRC", "P5NAT")) seurat_merged ``` From the callout we can see that we now have: `2 layers present: counts.1, counts.2` ::: {.callout-note collapse=true} # What if I am merging more than two samples? Seurat has functionality to merge many samples together. You can do this quite easily by adding all sample objects to the `y` argument in a vector format. An example is provided below, assuming you use the list structured we used previously: ```{r} #| label: merging_multiple_objects #| eval: false ## DO NOT RUN merged_seurat <- merge(x = seurat_list[[1]], y = seurat_list[[2:length(seurat_list)]], add.cell.id = names(seurat_list)) ``` ::: However, we do not want to have two distinct raw counts matrices (`counts.1` and `counts.2`). We want the counts for each sample to be concatenated together. This would be a single, large count matrix that represent each cell in the dataset. Seurat's function `JoinLayers()` allows us to do just this. ```{r} #| label: seurat_merged_JoinLayers seurat_merged <- JoinLayers(seurat_merged) ``` ::: callout-tip # [**Exercise 3**](02_loading-spatial-data_Answer-key.qmd#exercise-3) 3. What function can we use to double check that we have a singular `counts` _layer_? ::: ::: {.callout-note collapse=true} # Merging vs. Integration A common point of confusion is the distinction between **integration** and merging. In the field, integration is considered to be modifying either your counts or latent space in a way to correct for a batch variable. Whereas what we are doing now is **merging** or concatenating multiple samples together. This process of merging does not transform the values in the count matrices. ::: ## Evaluating `merged_seurat` Let’s also double check that we have the correct number of cells. First, let us see what the number of cells was for each sample: ```{r} #| label: ncells_math ncol(list_seurat[["P5CRC"]]) + ncol(list_seurat[["P5NAT"]]) ``` Which is the same as the number of samples in our Seurat callout! ```{r} #| label: seurat_merged_callout seurat_merged ``` If we look at the metadata of the merged object we should be able to see the prefixes in the rownames (`Cells`) as well as the updated `orig.ident` we set in the for loop earlier. ```{r} #| label: seurat_merged_metadata_head #| eval: false # Check that the merged object has the appropriate sample-specific prefixes seurat_merged@meta.data %>% head() ``` ```{r} #| label: tbl-seurat_merged_metadata_head_dt #| tbl-cap: First 5 rows of `@meta.data` #| echo: false seurat_merged@meta.data %>% head(5) %>% DT::datatable() ``` ```{r} #| label: seurat_merged_metadata_tail #| eval: false seurat_merged@meta.data %>% tail() ``` ```{r} #| label: tbl-seurat_merged_metadata_tail_dt #| tbl-cap: Last 5 rows of `@meta.data` #| echo: false seurat_merged@meta.data %>% tail(5) %>% DT::datatable() ``` Lastly we want to make sure that the `Idents` of our cells is a useful piece of information, for example like `orig.ident`, which contains our sample IDs. ```{r} #| label: seurat_merged_idents Idents(seurat_merged) <- "orig.ident" ``` # Save! Now is a great spot to save our `seurat_merged` object as the next step is going to be filtering. ```{r} #| label: seurat_merged_saveRDS #| eval: false # Save integrated Seurat object saveRDS(seurat_merged, "data/seurat_merged.RDS") ``` A good rule of thumb is to save your intermediate objects before doing any filtration or after running a computational step that takes a long time to run.

Learning Objectives

The Dataset

Metadata

Downloading the R Project and Data

Loading Data into a Seurat Object

Data Files

Creating the Seurat Object

Anatomy of a Seurat Object

Assays

Features and Cells

Layers

Spatial Fields

Metadata

Idents

Loading Multiple Samples into Seurat

Using a for Loop

Merge Datasets Together

Evaluating merged_seurat

Save!

Using a `for` Loop

Evaluating `merged_seurat`