Seurat Cheatsheet

Spatial transcriptomics

Seurat

Cheatsheet

This lesson will introduce a compact Seurat cheatsheet to keep at hand that summarizes key functions for spatial transcriptomics workflows, from data loading to clustering and plotting. Use this reference to speed up scripting, avoid common syntax errors and standardize your analysis pipeline.

Authors

Noor Sohail

Meeta Mistry

Will Gammerdinger

Published

May 30, 2025

Keywords

Plotting, Syntax, Data structures, DimPlot, FeaturePlot

Approximate time: 120 minutes

Learning objectives

In this lesson, we will:

Access various slots of the Seurat object
Generate a variety of plots with the plotting functions within Seurat

Overview of lesson

You’ve already used Seurat through several steps and seen many function calls. In this lesson, you will consolidate the core Seurat commands and object operations for spatial analysis into a concise reference. It doesn’t introduce new analysis concepts, but rather it makes it easier to reproduce and script what you’ve learned without constantly searching through documentation. This quick reference speeds up all later work when you adapt the pipeline to your own data.

Seurat Cheatsheet

This cheatsheet is meant to provide examples of the various functions available in Seurat. This includes how to access certain information, handy tips and visualization functions built into the package. We have pulled together all of this information with examples using the dataset used throughout this workshop so that there are clear visuals on what the output of each function is.

These materials were developed by referencing the following pages from the Seurat website:

Loading in the dataset

First, we will make a new script for us to practice these Seurat commands in. Will will call it Seurat_cheatsheet.R and place it within our scripts directory. At the top of the script let’s provide some useful information about the script, call our libraries and also load our data.

# Seurat Cheatsheet
# Visium HD spatial transcriptomics workshop
# Author: Harvard Chan Bioinformatics Core
# Created: May 2026

# Load libraries
library(Seurat)
library(tidyverse)

# Load Seurat object
seurat_processed <- readRDS("data/seurat_processed.RDS")

Accessing cell barcodes and gene names

Cell barcodes

Within Seurat, there are multiple ways to access the bin barcode IDs.

We can use colnames() to get a vector of bin barcodes in the same order as they appear in the Seurat object.

# Show bin barcodes
colnames(seurat_processed) %>%
  head()

[1] "P5CRC_s_008um_00078_00444-1" "P5CRC_s_008um_00128_00278-1"
[3] "P5CRC_s_008um_00052_00559-1" "P5CRC_s_008um_00121_00413-1"
[5] "P5CRC_s_008um_00202_00633-1" "P5CRC_s_008um_00035_00460-1"

Similarly, we can use the Cells() function:

# Show bin barcodes
Cells(seurat_processed) %>%
  head()

[1] "P5CRC_s_008um_00078_00444-1" "P5CRC_s_008um_00128_00278-1"
[3] "P5CRC_s_008um_00052_00559-1" "P5CRC_s_008um_00121_00413-1"
[5] "P5CRC_s_008um_00202_00633-1" "P5CRC_s_008um_00035_00460-1"

It is very important that the values stored in Cells() are the same as the rownames in meta.data. Otherwise, Seurat will start throwing errors at you!

# Check that all of our bins are present in the metadata
all(rownames(seurat_processed@meta.data) == Cells(seurat_processed))

[1] TRUE

Features/genes

We now want to be able to access the rows, or genes, in our Seurat object. Rather than calling these values “genes”, many tools will call them “features” as different assays (CITE-seq, ATAC-seq) provide alternative information than genes as output.

The Features() function returns a vector of all features/genes in our dataset in the same order as it appears in the count matrix.

# Show the genes names
Features(seurat_processed) %>%
  head()

[1] "SAMD11"  "NOC2L"   "KLHL17"  "PLEKHN1" "PERM1"   "HES4"

The rownames() function provides the same output:

# Show the genes names
rownames(seurat_processed) %>%
  head()

[1] "SAMD11"  "NOC2L"   "KLHL17"  "PLEKHN1" "PERM1"   "HES4"

Exercise 1

What are the last 5 cells barcodes and the last 5 genes in the Seurat object.

Number of cells and features

If you recall, Seurat stores your count matrix as bins (columns) x genes (rows).

Therefore, we can access the number of bins by using the ncol() function.

# Retrieve the number of bins
ncol(seurat_processed)

[1] 135798

The number of features is returned with the nrow() function:

# Retrieve the number of genes
nrow(seurat_processed)

[1] 18085

The dim() function provides both the number of bins and genes for the default assay. Here we see the number of features followed by the number of bins:

# Retrieve the number of genes and bins
dim(seurat_processed)

[1]  18085 135798

Idents

In Seurat, each bin has a label which can be accessed using Idents(). These are the default labels used for each bin and are used internally by Seurat plotting functions.

Common information set as the identity for bins include: clusters (as in our example dataset), celltype, sample, etc. You’ll notice that identities are automatically stored as factors, which means we can re-organize the levels at any point to change their order for plotting purposes.

# Show the idents
Idents(seurat_processed) %>%
  head()

P5CRC_s_008um_00078_00444-1 P5CRC_s_008um_00128_00278-1 
                          3                           1 
P5CRC_s_008um_00052_00559-1 P5CRC_s_008um_00121_00413-1 
                         11                           9 
P5CRC_s_008um_00202_00633-1 P5CRC_s_008um_00035_00460-1 
                          2                           3 
Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Rename Idents

To quickly make modifications to identities, you can use the RenameIdents() function where new values are mapped to the identities. This is particularly helpful when annotating your bins from clusters to celltypes as showcased here. Bear in mind that these new identities are not stored in the meta.data automatically. We recommend adding these identities as a new column in the Seurat object to keep track of it for future use.

# Rename all identities
seurat_processed <- RenameIdents(object = seurat_processed, 
                                 "1" = "Tumor",
                                 "2" = "B cells",
                                 "3" = "Intestinal epithelial cells")

# These new celltype values are only stored in the idents
# Good practice is to store these changes in a column
seurat_processed$celltype <- Idents(seurat_processed)

Exercise 2

What are the last 5 identities for the bins in the Seurat object?

Tissue coordinates

As this is a spatial dataset, there is another slot where the coordinates for each bin in our dataset is stored. We can use the GetTissueCoordinates() function in order to grab these x, y coordinates:

GetTissueCoordinates(seurat_processed) %>%
  tail()

                                   x        y                        cell
P5CRC_s_008um_00221_00323-1 53744.83 57545.55 P5CRC_s_008um_00221_00323-1
P5CRC_s_008um_00035_00373-1 55183.45 62986.56 P5CRC_s_008um_00035_00373-1
P5CRC_s_008um_00216_00448-1 57396.79 57706.69 P5CRC_s_008um_00216_00448-1
P5CRC_s_008um_00146_00559-1 60631.90 59765.47 P5CRC_s_008um_00146_00559-1
P5CRC_s_008um_00051_00318-1 53578.23 62512.40 P5CRC_s_008um_00051_00318-1
P5CRC_s_008um_00118_00312-1 53410.98 60553.91 P5CRC_s_008um_00118_00312-1

However, you may notice that the only bins that are returned belong to one sample. This is because this function will only return the results for one image at a time. So we would have to specify the image as it is stored within our Seurat object to access the coordinates for different tissues:

seurat_processed

An object of class Seurat 
36170 features across 135798 samples within 2 assays 
Active assay: Spatial.008um (18085 features, 2000 variable features)
 2 layers present: counts, data
 1 other assay present: sketch
 4 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch
 2 spatial fields of view present: P5CRC.008um P5NAT.008um

The second spatial fields of view present is P5NAT.008um which we will supply to the image parameter to grab the coordinates for the other slide in the dataset.

# Get tissue coordinates for NAT sample
GetTissueCoordinates(seurat_processed,
                     image = "P5NAT.008um") %>%
  tail()

                                   x        y                        cell
P5NAT_s_008um_00108_00352-1 55883.27 29664.67 P5NAT_s_008um_00108_00352-1
P5NAT_s_008um_00243_00304-1 57280.15 33611.29 P5NAT_s_008um_00243_00304-1
P5NAT_s_008um_00365_00197-1 60401.51 37180.46 P5NAT_s_008um_00365_00197-1
P5NAT_s_008um_00357_00367-1 55434.52 36939.66 P5NAT_s_008um_00357_00367-1
P5NAT_s_008um_00148_00248-1 58920.46 30837.78 P5NAT_s_008um_00148_00248-1
P5NAT_s_008um_00373_00222-1 59670.69 37413.17 P5NAT_s_008um_00373_00222-1

Highly variable features

Accessing variable features

To obtain a vector for all of the highly variable genes that were selected after running FindVariableFeatures(), we can use the VariableFeatures() function.

# Show the highly variable genes
VariableFeatures(seurat_processed) %>%
  head()

[1] "IGHG1" "IGHM"  "SST"   "IGLC1" "INSL5" "CHGA"

Setting variable features

Using the same VariableFeatures() function, we can set our own custom set of genes as our highly variable genes.

As an example, here we are omitting mitochondrial genes from the original list of variable genes:

# Get list of all variable genes
# Remove variable genes that start with MT-
var_genes <- VariableFeatures(seurat_processed)
var_genes <- var_genes[!startsWith(var_genes, "MT-")]

# Now we set our vector of gene names back to VariableFeatures()
VariableFeatures(seurat_processed) <- var_genes

Exercise 3

What are the 5 least variable genes (of the top 3,000) in the Seurat object?

Assays and layers

Assays

Within a Seurat object you can have multiple “assays”. Each assay contains its own count matrix that is separate from the other assays in the object. This structure was created with multimodal datasets in mind so we can store, for example, ATAC peaks within the same Seurat object as your RNA counts.

To access the list of assays in our Seurat object, you can call @assays.

# Display available assays
seurat_processed@assays

$Spatial.008um
Assay (v5) data with 18085 features for 135798 cells
Top 10 variable features:
 IGHG1, IGHM, SST, IGLC1, INSL5, CHGA, CXCL8, GCG, HBA2, IGLC7 
Layers:
 counts, data 

$sketch
Assay (v5) data with 18085 features for 10000 cells
Top 10 variable features:
 IGHG1, IGHM, IGLC1, CHGA, SST, CXCL8, GCG, INSL5, VIP, PYY 
Layers:
 counts, data, scale.data

We can additionally see which of the assays in our dataset is set as the default with the DefaultAssays() function. This default assay is automatically used as by Seurat functions, unless you specify otherwise in the parameters of your function.

# SHow defualt assay
DefaultAssay(seurat_processed)

[1] "Spatial.008um"

Here we can see that the default assay is set to Spatial.008um. If we instead wanted to see the RNA counts from the sketch assay, we can set a new default by once again calling the DefaultAssay() function and storing the name of a different assay.

# Set new default assay
DefaultAssay(seurat_processed) <- "sketch"

# Print out the new default to see if it changed
DefaultAssay(seurat_processed)

[1] "sketch"

We can access each assay as if the Seurat object was a named list with double brackets:

# Access an assay that isn't the default assay with double square brackets
seurat_processed[["Spatial.008um"]]

Assay (v5) data with 18085 features for 135798 cells
Top 10 variable features:
 IGHG1, IGHM, SST, IGLC1, INSL5, CHGA, CXCL8, GCG, HBA2, IGLC7 
Layers:
 counts, data

And similarly run any function on it:

# Run a function on an accessed assay that isn't the default assay
dim(seurat_processed[["sketch"]])

[1] 18085 10000

Exercise 4

What are the dimensions for each assay in the Seurat object?

Layers

Layers are the different counts matrices that you can access within each assay (prior to Seurat version 5, this feature was known as “slots”).

Following the standard Seurat workflow, you would have the following matrices:

counts (raw counts matrix)
data (normalized count matrix (generated after SCTransform() or NormalizeData())
scale.data (output from the ScaleData())

We can see which layers are accessible with the Layers() function.

# Display layers
Layers(seurat_processed[["sketch"]])

[1] "counts"     "data"       "scale.data"

Accessing full count matrix

You can grab the entire counts matrix by making use of the LayerData() function.

# Subsetting to the first 5 genes and bins for easy viewing
LayerData(seurat_processed,
          assay = "Spatial.008um",
          layer = "counts")[1:5, 1:5]

5 x 5 sparse Matrix of class "dgCMatrix"
        P5CRC_s_008um_00078_00444-1 P5CRC_s_008um_00128_00278-1
SAMD11                            .                           .
NOC2L                             .                           .
KLHL17                            .                           .
PLEKHN1                           .                           .
PERM1                             .                           .
        P5CRC_s_008um_00052_00559-1 P5CRC_s_008um_00121_00413-1
SAMD11                            .                           .
NOC2L                             .                           .
KLHL17                            .                           .
PLEKHN1                           .                           .
PERM1                             .                           .
        P5CRC_s_008um_00202_00633-1
SAMD11                            .
NOC2L                             .
KLHL17                            .
PLEKHN1                           .
PERM1                             .

Exercise 5

Show the code to get the entire sketch log-normalized (data) count matrix.

Accessing specific features and metadata

The FetchData() function is useful to directly access the counts of a feature for each bin. You can also specify the layer and assay to specify which piece of information you want.

# Normalized counts for the gene PTPRC in the assay sketch
FetchData(seurat_processed,
          vars = c("PTPRC", "sample"), 
          assay = "sketch",
          layer="data") %>% head()

                            PTPRC
P5CRC_s_008um_00205_00489-1     0
P5CRC_s_008um_00098_00552-1     0
P5CRC_s_008um_00151_00503-1     0
P5CRC_s_008um_00107_00427-1     0
P5CRC_s_008um_00065_00311-1     0
P5CRC_s_008um_00165_00349-1     0

Conveniently, you can also get information from multiple assays at the same time. To do so, you prepend the assay name (in lowercase format) for the feature you supply to the FetchData() function.

# Grab the normalized counts in the integrated and RNA assays
FetchData(seurat_processed,
          vars=c("sketch_PTPRC", "Spatial.008um_PTPRC"), 
          layer="data") %>% 
  head()

                            sketch_PTPRC
P5CRC_s_008um_00205_00489-1            0
P5CRC_s_008um_00098_00552-1            0
P5CRC_s_008um_00151_00503-1            0
P5CRC_s_008um_00107_00427-1            0
P5CRC_s_008um_00065_00311-1            0
P5CRC_s_008um_00165_00349-1            0

Exercise 6

Show how you would use the FetchData() function to generate a dataframe of fullumapsketch_1, fullumapsketch_2 and orig.ident values for each cell.

Accessing dimensional reductions

PCA

The scores for each PC is stored within the embeddings slot of the Seurat object. These can be accessed by using the Embeddings() function.

# Show PCA embeddings
# Alternative method of accessing PCA values
# seurat_processed[['pca']]@cell.embeddings
Embeddings(seurat_processed,
           reduction = "pca.sketch")[1:5, 1:5]

                                  PC_1       PC_2        PC_3       PC_4
P5CRC_s_008um_00205_00489-1   3.709785  1.8213648  -2.3974464  0.7473072
P5CRC_s_008um_00098_00552-1   1.979896 -0.7775721  -3.1652273 -1.3673613
P5CRC_s_008um_00151_00503-1   3.792683  3.4938893 -10.1402720 31.2417728
P5CRC_s_008um_00107_00427-1   2.154885 -4.0624512  -3.5581731 -3.0002939
P5CRC_s_008um_00065_00311-1 -22.419118  6.0348684   0.7173482  0.6259992
                                   PC_5
P5CRC_s_008um_00205_00489-1   0.7568303
P5CRC_s_008um_00098_00552-1  -0.9723445
P5CRC_s_008um_00151_00503-1 -13.3532702
P5CRC_s_008um_00107_00427-1  -1.0628408
P5CRC_s_008um_00065_00311-1   1.5016913

The weights (loadings) for each feature are also stored and can be accessed with Loadings() function.

# Show PCA loadings
# Alternative method of accessing PCA loadings
# pbmc[['pca]]@feature.loadings
Loadings(seurat_processed,
         reduction = "pca.sketch")[1:5, 1:5]

              PC_1         PC_2        PC_3         PC_4         PC_5
IGHG1  0.008454886  0.020305947  0.02131299 -0.011029425  0.001427392
IGHM   0.033022561  0.028325733  0.03002541 -0.030765923  0.008312626
IGLC1  0.017116523  0.011749722  0.02217907 -0.010199287  0.023402614
CHGA  -0.002113369 -0.017450566 -0.03846076  0.001083699 -0.006691523
SST    0.000300984 -0.001915676 -0.01129114  0.002629025 -0.002397141

We can also view more information about the top PCs, like the genes that are most strongly correlated with the first few PCs with the ProjectDim() function.

# Show top genes associated with PCs
ProjectDim(seurat_processed,
           reduction = "pca.sketch")

PC_ 1 
Positive:  IGKC, JCHAIN, A2M, CD74, IGHA1, SPARCL1, IGHM, COL3A1, SRGN, ADAMDEC1 
       TAGLN, CXCR4, C3, TIMP3, COL6A2, IGFBP5, CCN2, SELENOP, APOE, IL7R 
Negative:  LCN2, CEACAM6, CEACAM5, GPX2, ID1, DPEP1, NQO1, S100P, MDK, FERMT1 
       GGH, REG3A, CDC25B, IFITM3, PTP4A3, GMDS, ITGA6, HOXB9, ZNF703, SERPINA1 
PC_ 2 
Positive:  CD74, SRGN, IFITM3, CD44, REG3A, IFITM2, SPARC, A2M, CDC25B, COL3A1 
       DPEP1, PTP4A3, SPARCL1, LGALS1, TGFBI, C3, CXCR4, IGFBP7, COL1A1, LAPTM5 
Negative:  MUC12, PIGR, SLC26A3, FABP1, SLC26A2, FXYD3, CKB, TSPAN1, GUCA2A, FCGBP 
       CEACAM7, CA1, CA2, KRT20, ZG16, PLA2G2A, CA4, MUC2, SGK1, TMPRSS2 
PC_ 3 
Positive:  SELENOP, C1QA, C1QC, CD74, C1QB, APOE, SLC40A1, SRGN, LYZ, MS4A6A 
       CEACAM7, GUCA2A, SLC26A3, CXCL14, MS4A7, CTSB, FCER1G, TYROBP, IFI30, MPEG1 
Negative:  CLCA1, LEFTY1, WFDC2, KIAA1324, NXPE1, VIP, SCG2, MUC5B, SCGN, ITLN1 
       CALB2, PCSK1N, HMGCS2, NXPE4, WNK2, KLK1, RGMB, SPINK4, TFF3, L1CAM 
PC_ 4 
Positive:  VIP, CALB2, SCG2, SCGN, TUBB2B, CHRNA3, UCHL1, L1CAM, RTN1, NSG2 
       GAL, HAND2, SYT1, PCSK1N, AKAP12, TMEM59L, STMN2, ETV1, TUBB2A, PRPH 
Negative:  CLCA1, CXCL8, G0S2, LEFTY1, ITLN1, WFDC2, MUC2, IL1B, MUC5B, PLA2G2A 
       IL1R2, BCL2A1, NXPE1, FCGBP, IL1RN, TFF3, PLEK, PIGR, AQP9, KLK1 
PC_ 5 
Positive:  A2M, COL3A1, JCHAIN, PLVAP, IGFBP7, CXCL14, SPARCL1, COL6A2, SPARC, F3 
       COL1A2, PECAM1, COL4A1, ENG, COL6A1, IGKC, RGS5, COL1A1, VWF, ADAMDEC1 
Negative:  CXCL8, G0S2, IL1B, BCL2A1, IL1RN, S100A9, AQP9, S100A8, PLEK, PTGS2 
       OSM, NAMPT, PLAUR, ITGAX, IL1R2, CSF3R, PFKFB3, FCAR, SOD2, ACSL1

An object of class Seurat 
36170 features across 135798 samples within 2 assays 
Active assay: sketch (18085 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 1 other assay present: Spatial.008um
 4 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch
 2 spatial fields of view present: P5CRC.008um P5NAT.008um

UMAP/tSNE

To access the coordinates used for UMAP/tSNE plots, we specify the reduction of interest in the Embeddings() function.

# Accessing UMAP embeddings
# Alternative method of accessing UMAP embeddings
# seurat_processed[['umap']]@cell.embeddings
Embeddings(seurat_processed,
           reduction = "umap.sketch") %>%
  head()

                            umapsketch_1 umapsketch_2
P5CRC_s_008um_00205_00489-1    1.3003940   -5.6375675
P5CRC_s_008um_00098_00552-1   -0.3800069    4.2747884
P5CRC_s_008um_00151_00503-1    0.6560353  -12.7625957
P5CRC_s_008um_00107_00427-1   -0.8216718    5.9898009
P5CRC_s_008um_00065_00311-1    8.9849455    1.3358915
P5CRC_s_008um_00165_00349-1    9.7599007   -0.7390782

Data visualization

Underneath the hood, all of Seurat’s plotting functions make use of ggplot which means that we can add more details to our plots using ggplot functions.

DimPlot

The DimPlot() function allows us to visualize metadata that is categorical on different reductions (PCA, UMAP).

By default, DimPlot() will color bins by the Idents() and use UMAP as the default reduction.

# Show different clusters on the UMAP
DimPlot(seurat_processed) +
  ggtitle("Seurat clusters")

Figure 1: `DimPlot` highlighting the the different clusters on the UMAP.

We can specify a different metadata column using the group.by argument

# Plot bins on UMAP different and colored by their orig.ident 
DimPlot(seurat_processed,
        group.by = "orig.ident")

Figure 2: Grouping the `DimPlot` by the `orig.ident`.

We can also use the split.by argument to create multiple plots that only show cells that have the same value for the metadata column specified.

# Plot bins on UMAP different and colored by their orig.ident and have different plots for each orig.ident
DimPlot(seurat_processed,
        split.by = "orig.ident",
        group.by="orig.ident")

Figure 3: Plot the `DimPlot` side-by-side for the two `orig.ident` and color them likewise.

SpatialDimPlot

Similarly, we can plot these categorical values and plot them on top of the tissue slide with SpatialDimPlot():

# Plot bins on spatial slide, colored by their orig.ident 
SpatialDimPlot(seurat_processed,
        group.by = "orig.ident",
        pt.size.factor = 15)

Figure 4: Grouping the `SpatialDimPlot` by the `orig.ident`.

There are a variety of different parameters that can be specified in order to create the plot that you want to make.

FeaturePlot

The FeaturePlot() function allows us to visualize both metadata and features that are continuous on different reductions (PCA, UMAP).

# Highlight gene expression on the UMAP
FeaturePlot(seurat_processed, 
            features = c("PLVAP", "ENG"))

Figure 5: Displaying the expression of two Endothelial cells markers on the UMAP.

We can additionally order the values in a way that bins with higher values are shown in front (to avoid other bins drowning out them).

To identify bins that show the highest expression of a feature, we can set a min.cutoff based upon quantiles, where bins below the the threshold will show no expression.

# Highlight gene expression on the UMAP with a min.cutoff
FeaturePlot(seurat_processed, 
            reduction = "umap.sketch", 
            features = c("PLVAP", "ENG"), 
            order = TRUE,
            min.cutoff = 'q10')

Figure 6: Displaying the expression of two Endothelial cells markers on the UMAP, while applying a `min.off`.

We can also add labels onto our UMAP to easily identify which groups of bins we are seeing the expression using the LabelClusters() function. The parameters shown here put a white background behind the text to make it easier to see the labels.

# Set idents to seurat_cluster.projected
Idents(seurat_processed) <- "seurat_cluster.projected"

# Overlap gene expression on a UMAP
p <- FeaturePlot(seurat_processed, 
            reduction = "umap.sketch", 
            features = "LCN2", 
            order = TRUE,
            min.cutoff = 'q10')

# Label clusters
LabelClusters(p,
              id = "ident",
              fontface = "bold",
              size = 8, 
              bg.colour = "white",
              bg.r = .2,
              force = 0)

Figure 7: Displaying the expression of a Tumor cell marker LCN2 on the UMAP, while also showing the cluster labels.

SpatialFeaturePlot

Just as we can plot genes or numeric values on top of the UMAP, we can also do the same with the tissue with SpatialFeaturePlot()

# Overlap gene expression on a spatial side
SpatialFeaturePlot(seurat_processed, 
            features = "LCN2", 
            pt.size.factor = 15)

Figure 8: Displaying the expression of a Tumor cell marker LCN2 on the tissue.

FeatureScatter

FeatureScatter() creates a scatterplot of expression values for two features with each bin being colored by their ident. Bear in mind that you can also specify a continuous metadata column and not just 2 genes/features.

# Set idents to seurat_cluster.projected
Idents(seurat_processed) <- "seurat_cluster.projected"

# Create a scatterplot using two features
FeatureScatter(seurat_processed,
               feature1 = "MT-ND5",
               feature2 = "mitoRatio",
               pt.size = 2) + 
  ggtitle("MitoRatio vs MT-ND5 expression")

Figure 9: A scatterplot showing the relation between two features (MitoRatio and MT-ND5 expression) across the dataset.

CellScatter

To visualize the differences between two specific bins, you can use the CellScatter() function to get a scatterplot of values for each feature in both bins

# Obtain a bin name
bin1 <- Cells(seurat_processed)[2] 
# Obtain a second bin name
bin2 <- Cells(seurat_processed)[7]

# Here we can see the metadata for the first two bins in the dataset
seurat_processed@meta.data %>% 
  subset(rownames(seurat_processed@meta.data) %in% c(bin1, bin2)) %>% 
  select(orig.ident, nCount_Spatial.008um)

                            orig.ident nCount_Spatial.008um
P5CRC_s_008um_00098_00552-1      P5CRC                  199
P5CRC_s_008um_00115_00346-1      P5CRC                  467

# Plot the expression from two bins against each other 
CellScatter(seurat_processed,
            cell1 = bin1,
            cell2 = bin2)

Figure 10: A scatterplot showing the relationship in expression between two bins.

VlnPlot

We can create a violin plot to compare the distribution of gene expression across different populations using the VlnPlot() function.

This is a very customizable function with many parameters to customize the look of the plots.

# Create violin plot for each cluster plotted by two genes 
VlnPlot(seurat_processed,
        features = c("CEACAM5", "LCN2"))

Figure 11: Violin plots showing the distribution of two Tumor cell markers across the different clusters.

In this next example, we are grouping expression by sample and showing 2 plots per column. We are also removing the points (cells) by setting their size to 0.

# Creating a multi-panel violin plot for four genes comparing the orig.ident for each 
VlnPlot(seurat_processed, 
        features = c("CEACAM6", "CEACAM5", "KRT8", "LCN2"), 
        group.by="orig.ident",
        ncol=2,
        pt.size=0)

Figure 12: Violin plots showing the distribution of four Tumor cell markers when comparing the `orig.ident`.`

RidgePlot

Ridge plots are most commonly used with CITE-seq or hashtagged datasets as they provide an easy way to identify cells that express a protein/antibody.

When we call RidgePlot() for our spatial dataset, on the y-axis we see the unique identities assigned for each bin and the x-axis shows us the expression level for whichever feature we chose. This is a great visualization to use when justifying annotation decisions.

# Create RidgePlot to visualize expression levels across different clusters
RidgePlot(seurat_processed,
          features = "KRT8",
          assay="Spatial.008um") +
  NoLegend()

Figure 13: A `RidgePlot` showing the distribution of a Tumor marker across the different clusters.

DimHeatmap

To see the effect of genes on a principal component, we can visualize the top and bottom features in PC1 using the DimHeatmap() function.

# Show the impact of the most influential genes in PC1
DimHeatmap(seurat_processed,
           nfeatures = 10,
           reduction = "pca.sketch")

Figure 14: A heatmap showing the top and bottom most influential genes on PC1.

By default, PC1 is visualized, but we can edit which principal component we would like to visualize with the dims argument. If we wanted PC2, the code would be:

# Show the impact of the most influential genes in PC2
DimHeatmap(seurat_processed,
           nfeatures = 10,
           reduction = "pca.sketch",
           dims = 2)

Figure 15: A heatmap showing the top and bottom most influential genes on PC2.

Or if we wanted to inspect principal components 1 through 3, then we could do:

# Show the impact of the most influential genes in PC1, PC2 and PC3
DimHeatmap(seurat_processed,
           nfeatures = 10,
           reduction = "pca.sketch",
           dims = 1:3)

Figure 16: A heatmap showing the top and bottom most influential genes on PC1, PC2 and PC3.

DoHeatmap

To plot the expression values for genes across all bins (grouped by their identity), you can call Seurat’s DoHeatmap() function to identify which populations certain genes are lowly or highly expressed in.

# Plot the expression values across all bins grouped by their identity
DoHeatmap(seurat_processed,
          features=c("CEACAM6", "CEACAM5", "LCN2"))

Figure 17: A heatmap showing expression of three Tumor cell markers across the different clusters.

DotPlot

Seurat also has a built-in visualization tool which allows us to view the average expression of genes, grouped by cluseter, with a function called DotPlot(). The size of the circle represents the number of bins that express the gene within a group and the hue represents the average expression of the feature.

If you supply a named list with labels annotating genes, those labels will appear at the top of the plot for easier visualization.

# List of known celltype markers
# Create and empty list to hold our cell types and markers
markers <- list()

# Populate the list with known markers for cell types
markers[["B cells"]] <- c("IGKC", "IGHM", "CD79A", "MS4A1", "MZB1")
markers[["Endothelial cells"]] <- c("PECAM1", "VWF", "PLVAP", "ENG", "KLF2")
markers[["Fibroblasts"]] <- c("COL1A1", "COL3A1", "DCN", "LUM", "COL6A2")
markers[["T cells"]] <- c("TRAC", "CD3E", "TRBC2", "IL7R", "CD52")
markers[["Tumor cells"]] <- c("CEACAM6", "CEACAM5", "EPCAM", "KRT8", "LCN2")

# Create dotplot based on RNA expression
DotPlot(seurat_processed,
        features = markers,
        assay="Spatial.008um") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Figure 18: A `DotPlot` displaying various celltypes and their marker genes.

Next Lesson >>

Back to Schedule

Reuse

CC-BY-4.0