library(Seurat)
library(tidyverse)
crc <- Load10X_Spatial(data.dir = "data/P5CRC_cropped/",
bin.size = c(8, 16),
slice = "P5CRC")Anatomy of a Seurat Object
Write a description of the lesson here.
keyword_1, keyword_2, keyword_3, keyword_4, keyword_5, keyword_6
Approximate time: XX minutes
Learning objectives
In this lesson, we will:
- Learning Objective 1
- Learning Objective 2
- Learning Objective 3
Overview of lesson
When doing XYZ…
Load dataset
Anatomy of a Seurat object
As we can see from our Seurat callout, there are a lot of different slots inside our object. Here, we will go through each of the major components of a Seurat object and how you would access key pieces of information.
Assays
Assays are where we can store different counts matrices - we are not forced to keep the same features and variable genes across assays. Each assay will contain it’s own Layers that can be distinct from other assays in the object. This is useful in several different cases:
- Multi-modal assays, where you can keep the expression matrices for RNA, ATAC, or protein in a single Seurat object
- Storing counts matrices from a variety of different normalization techniques
- Batch integration methods will sometimes generate a transformed counts matrix
Here we can print the different assays that exist within our crc object:
Assays(crc)[1] "Spatial.008um" "Spatial.016um"
We have 2 distinct assays for the different bin sizes, which makes sense because we have different count matrices for our cells based upon the bin size that was selected.
The DefaultAssay() function shows us which assay information will be used in other Seurat function calls, unless explicitly specified otherwise.
DefaultAssay(crc)[1] "Spatial.008um"
We can also change what our default assay is. Let’s set it to the 016um bins:
DefaultAssay(crc) <- "Spatial.016um"
crcAn object of class Seurat
36170 features across 97570 samples within 2 assays
Active assay: Spatial.016um (18085 features, 0 variable features)
1 layer present: counts
1 other assay present: Spatial.008um
2 spatial fields of view present: P5CRC.008um P5CRC.016um
Now we see that the callout says: Active assay: Spatial.016um
Features and Cells
Our count matrices function as any other matrix does, with rows and columns.
In Seurat, the rows correspond to Features. In the case of spatial transcriptomics, our features are genes. In other experiments, features could refer to chromatin peaks or proteins. The important thing to keep in mind is what technology you are using. Since this is a Visium HD dataset, we are quantifying RNA expression (genes).
We can see what the first few genes/features in our count matrix are:
Features(crc) %>% head()[1] "SAMD11" "NOC2L" "KLHL17" "PLEKHN1" "PERM1" "HES4"
As well as see the number of genes that are found in each of our assays:
nrow(crc[["Spatial.008um"]])[1] 18085
nrow(crc[["Spatial.016um"]])[1] 18085
The columns correspond to Cells (or samples as it appears in the callout). We can see what the first few cells in our count matrix are:
Cells(crc) %>% head()[1] "s_016um_00050_00315-1" "s_016um_00064_00214-1" "s_016um_00101_00317-1"
[4] "s_016um_00049_00195-1" "s_016um_00032_00133-1" "s_016um_00061_00268-1"
As well as see the number of cells that are found in each of our assays:
ncol(crc[["Spatial.008um"]])[1] 77896
ncol(crc[["Spatial.016um"]])[1] 19674
- What differences do you see between the
8umand16umbins?
Layers
Layers are our count matrices.
Layers(crc)[1] "counts"
By default, Seurat uses the following naming convention for the counts matrices within an Assay:
Layer() count matrix
| Layer | Description |
|---|---|
| counts | Raw counts |
| data | Normalized counts |
| scale.data | Scaled‑normalized counts |
You may notice that our Seurat object only contains counts right now. This is because we have not run any normalization steps yet (we will discuss how to do so in future lessons).
Using the LayerData() function we can access the entire counts matrix. Furthermore, we can specify the assay if we would prefer to not use the DefaultAssay.
LayerData(crc,
assay = "Spatial.016um",
layer = "count")[1:5, 1:5]| s_016um_00050_00315-1 | s_016um_00064_00214-1 | s_016um_00101_00317-1 | s_016um_00049_00195-1 | s_016um_00032_00133-1 | |
|---|---|---|---|---|---|
| SAMD11 | 0 | 0 | 0 | 0 | 0 |
| NOC2L | 0 | 0 | 0 | 0 | 0 |
| KLHL17 | 0 | 0 | 0 | 0 | 0 |
| PLEKHN1 | 0 | 0 | 0 | 0 | 0 |
| PERM1 | 0 | 0 | 0 | 0 | 0 |
By printing the first 5 features and cells in our object (for easier visualization). We can see that we are working with whole numbers which reinforces the idea that this is the raw data, with no transformations having been applied.
Spatial fields
We do not just have expression data associated with our sample, we also have the spatial slide that comes with its own set of values and information.
For example we can grab the x,y coordinates of each bin using the GetTissueCoordinates() function.
GetTissueCoordinates(crc) %>% View()| x | y | cell | |
|---|---|---|---|
| s_016um_00050_00315-1 | 62715.76 | 61103.62 | s_016um_00050_00315-1 |
| s_016um_00064_00214-1 | 56816.46 | 60261.12 | s_016um_00064_00214-1 |
| s_016um_00101_00317-1 | 62844.88 | 58123.60 | s_016um_00101_00317-1 |
| s_016um_00049_00195-1 | 55702.46 | 61133.16 | s_016um_00049_00195-1 |
| s_016um_00032_00133-1 | 52074.99 | 62111.72 | s_016um_00032_00133-1 |
Or visualize what our slide looks like with SpatialDimPlot():
Metadata
Seurat automatically creates some metadata for each of the cells when the object is created. This information is stored in the @meta.data slot within the Seurat object. The rownames are automatically set to be the cell names.
crc@meta.data %>% View()@meta.data
| orig.ident | nCount_Spatial.008um | nFeature_Spatial.008um | nCount_Spatial.016um | nFeature_Spatial.016um | |
|---|---|---|---|---|---|
| s_008um_00078_00444-1 | s | 65 | 57 | NA | NA |
| s_008um_00128_00278-1 | s | 1300 | 906 | NA | NA |
| s_008um_00052_00559-1 | s | 128 | 121 | NA | NA |
| s_008um_00121_00413-1 | s | 538 | 326 | NA | NA |
| s_008um_00167_00326-1 | s | 44 | 39 | NA | NA |
What does each column represent?
@meta.data
| Column | Description |
|---|---|
| orig.ident | Sample identity if known; defaults to “s” |
| nCount_RNA | Number of UMIs per cell |
| nFeature_RNA | Number of genes detected per cell |
While it may seem intimidating at first, the important thing to remember is that this is a dataframe. Therefore can modify and work with this dataframe just like we would any other in R! For example, we can set our orig.ident column to be our sample name rather than “s”.
crc@meta.data$orig.ident <- "P5CRC"crc@meta.data %>% View()@meta.data after updating orig.ident
| orig.ident | nCount_Spatial.008um | nFeature_Spatial.008um | nCount_Spatial.016um | nFeature_Spatial.016um | |
|---|---|---|---|---|---|
| s_008um_00078_00444-1 | P5CRC | 65 | 57 | NA | NA |
| s_008um_00128_00278-1 | P5CRC | 1300 | 906 | NA | NA |
| s_008um_00052_00559-1 | P5CRC | 128 | 121 | NA | NA |
| s_008um_00121_00413-1 | P5CRC | 538 | 326 | NA | NA |
| s_008um_00167_00326-1 | P5CRC | 44 | 39 | NA | NA |
Additionally, we do not have use the @meta.data each time we want to access a single column. We can use the $ follow by the column name as a shorthand.
crc$nCount_Spatial.008um %>% head()s_008um_00078_00444-1 s_008um_00128_00278-1 s_008um_00052_00559-1
65 1300 128
s_008um_00121_00413-1 s_008um_00167_00326-1 s_008um_00202_00633-1
538 44 365
Idents
The cell identities are stored as Idents(), which contain the default way to label cells. For example, if we wanted to label each cell by which sample they came from we could run:
Idents(crc) <- "orig.ident"
Idents(crc) %>% head()s_008um_00078_00444-1 s_008um_00128_00278-1 s_008um_00052_00559-1
P5CRC P5CRC P5CRC
s_008um_00121_00413-1 s_008um_00167_00326-1 s_008um_00202_00633-1
P5CRC P5CRC P5CRC
Levels: P5CRC
Where we see that the identities of the cells are the value stored in the @meta.data column orig.ident.