# Seurat Cheatsheet
# Visium HD spatial transcriptomics workshop
# Author: Harvard Chan Bioinformatics Core
# Created: May 2026
# Load libraries
library(Seurat)
library(tidyverse)
# Load Seurat object
seurat_processed <- readRDS("data/seurat_processed.RDS")Seurat Cheatsheet
This lesson will introduce a compact Seurat cheatsheet to keep at hand that summarizes key functions for spatial transcriptomics workflows, from data loading to clustering and plotting. Use this reference to speed up scripting, avoid common syntax errors and standardize your analysis pipeline.
Plotting, Syntax, Data structures, DimPlot, FeaturePlot
Approximate time: 120 minutes
Learning objectives
In this lesson, we will:
- Access various slots of the Seurat object
- Generate a variety of plots with the plotting functions within Seurat
Overview of lesson
You’ve already used Seurat through several steps and seen many function calls. In this lesson, you will consolidate the core Seurat commands and object operations for spatial analysis into a concise reference. It doesn’t introduce new analysis concepts, but rather it makes it easier to reproduce and script what you’ve learned without constantly searching through documentation. This quick reference speeds up all later work when you adapt the pipeline to your own data.
Seurat Cheatsheet
This cheatsheet is meant to provide examples of the various functions available in Seurat. This includes how to access certain information, handy tips and visualization functions built into the package. We have pulled together all of this information with examples using the dataset used throughout this workshop so that there are clear visuals on what the output of each function is.
These materials were developed by referencing the following pages from the Seurat website:
Loading in the dataset
First, we will make a new script for us to practice these Seurat commands in. Will will call it Seurat_cheatsheet.R and place it within our scripts directory. At the top of the script let’s provide some useful information about the script, call our libraries and also load our data.
Accessing cell barcodes and gene names
Cell barcodes
Within Seurat, there are multiple ways to access the bin barcode IDs.
We can use colnames() to get a vector of bin barcodes in the same order as they appear in the Seurat object.
# Show bin barcodes
colnames(seurat_processed) %>%
head()[1] "P5CRC_s_008um_00078_00444-1" "P5CRC_s_008um_00128_00278-1"
[3] "P5CRC_s_008um_00052_00559-1" "P5CRC_s_008um_00121_00413-1"
[5] "P5CRC_s_008um_00202_00633-1" "P5CRC_s_008um_00035_00460-1"
Similarly, we can use the Cells() function:
# Show bin barcodes
Cells(seurat_processed) %>%
head()[1] "P5CRC_s_008um_00078_00444-1" "P5CRC_s_008um_00128_00278-1"
[3] "P5CRC_s_008um_00052_00559-1" "P5CRC_s_008um_00121_00413-1"
[5] "P5CRC_s_008um_00202_00633-1" "P5CRC_s_008um_00035_00460-1"
It is very important that the values stored in Cells() are the same as the rownames in meta.data. Otherwise, Seurat will start throwing errors at you!
# Check that all of our bins are present in the metadata
all(rownames(seurat_processed@meta.data) == Cells(seurat_processed))[1] TRUE
Features/genes
We now want to be able to access the rows, or genes, in our Seurat object. Rather than calling these values “genes”, many tools will call them “features” as different assays (CITE-seq, ATAC-seq) provide alternative information than genes as output.
The Features() function returns a vector of all features/genes in our dataset in the same order as it appears in the count matrix.
# Show the genes names
Features(seurat_processed) %>%
head()[1] "SAMD11" "NOC2L" "KLHL17" "PLEKHN1" "PERM1" "HES4"
The rownames() function provides the same output:
# Show the genes names
rownames(seurat_processed) %>%
head()[1] "SAMD11" "NOC2L" "KLHL17" "PLEKHN1" "PERM1" "HES4"
- What are the last 5 cells barcodes and the last 5 genes in the Seurat object.
Number of cells and features
If you recall, Seurat stores your count matrix as bins (columns) x genes (rows).
Therefore, we can access the number of bins by using the ncol() function.
# Retrieve the number of bins
ncol(seurat_processed)[1] 135798
The number of features is returned with the nrow() function:
# Retrieve the number of genes
nrow(seurat_processed)[1] 18085
The dim() function provides both the number of bins and genes for the default assay. Here we see the number of features followed by the number of bins:
# Retrieve the number of genes and bins
dim(seurat_processed)[1] 18085 135798
Idents
In Seurat, each bin has a label which can be accessed using Idents(). These are the default labels used for each bin and are used internally by Seurat plotting functions.
Common information set as the identity for bins include: clusters (as in our example dataset), celltype, sample, etc. You’ll notice that identities are automatically stored as factors, which means we can re-organize the levels at any point to change their order for plotting purposes.
# Show the idents
Idents(seurat_processed) %>%
head()P5CRC_s_008um_00078_00444-1 P5CRC_s_008um_00128_00278-1
3 1
P5CRC_s_008um_00052_00559-1 P5CRC_s_008um_00121_00413-1
11 9
P5CRC_s_008um_00202_00633-1 P5CRC_s_008um_00035_00460-1
2 3
Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Rename Idents
To quickly make modifications to identities, you can use the RenameIdents() function where new values are mapped to the identities. This is particularly helpful when annotating your bins from clusters to celltypes as showcased here. Bear in mind that these new identities are not stored in the meta.data automatically. We recommend adding these identities as a new column in the Seurat object to keep track of it for future use.
# Rename all identities
seurat_processed <- RenameIdents(object = seurat_processed,
"1" = "Tumor",
"2" = "B cells",
"3" = "Intestinal epithelial cells")
# These new celltype values are only stored in the idents
# Good practice is to store these changes in a column
seurat_processed$celltype <- Idents(seurat_processed)- What are the last 5 identities for the bins in the Seurat object?
Tissue coordinates
As this is a spatial dataset, there is another slot where the coordinates for each bin in our dataset is stored. We can use the GetTissueCoordinates() function in order to grab these x, y coordinates:
GetTissueCoordinates(seurat_processed) %>%
tail() x y cell
P5CRC_s_008um_00221_00323-1 53744.83 57545.55 P5CRC_s_008um_00221_00323-1
P5CRC_s_008um_00035_00373-1 55183.45 62986.56 P5CRC_s_008um_00035_00373-1
P5CRC_s_008um_00216_00448-1 57396.79 57706.69 P5CRC_s_008um_00216_00448-1
P5CRC_s_008um_00146_00559-1 60631.90 59765.47 P5CRC_s_008um_00146_00559-1
P5CRC_s_008um_00051_00318-1 53578.23 62512.40 P5CRC_s_008um_00051_00318-1
P5CRC_s_008um_00118_00312-1 53410.98 60553.91 P5CRC_s_008um_00118_00312-1
However, you may notice that the only bins that are returned belong to one sample. This is because this function will only return the results for one image at a time. So we would have to specify the image as it is stored within our Seurat object to access the coordinates for different tissues:
seurat_processedAn object of class Seurat
36170 features across 135798 samples within 2 assays
Active assay: Spatial.008um (18085 features, 2000 variable features)
2 layers present: counts, data
1 other assay present: sketch
4 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch
2 spatial fields of view present: P5CRC.008um P5NAT.008um
The second spatial fields of view present is P5NAT.008um which we will supply to the image parameter to grab the coordinates for the other slide in the dataset.
# Get tissue coordinates for NAT sample
GetTissueCoordinates(seurat_processed,
image = "P5NAT.008um") %>%
tail() x y cell
P5NAT_s_008um_00108_00352-1 55883.27 29664.67 P5NAT_s_008um_00108_00352-1
P5NAT_s_008um_00243_00304-1 57280.15 33611.29 P5NAT_s_008um_00243_00304-1
P5NAT_s_008um_00365_00197-1 60401.51 37180.46 P5NAT_s_008um_00365_00197-1
P5NAT_s_008um_00357_00367-1 55434.52 36939.66 P5NAT_s_008um_00357_00367-1
P5NAT_s_008um_00148_00248-1 58920.46 30837.78 P5NAT_s_008um_00148_00248-1
P5NAT_s_008um_00373_00222-1 59670.69 37413.17 P5NAT_s_008um_00373_00222-1
Highly variable features
Accessing variable features
To obtain a vector for all of the highly variable genes that were selected after running FindVariableFeatures(), we can use the VariableFeatures() function.
# Show the highly variable genes
VariableFeatures(seurat_processed) %>%
head()[1] "IGHG1" "IGHM" "SST" "IGLC1" "INSL5" "CHGA"
Setting variable features
Using the same VariableFeatures() function, we can set our own custom set of genes as our highly variable genes.
As an example, here we are omitting mitochondrial genes from the original list of variable genes:
# Get list of all variable genes
# Remove variable genes that start with MT-
var_genes <- VariableFeatures(seurat_processed)
var_genes <- var_genes[!startsWith(var_genes, "MT-")]
# Now we set our vector of gene names back to VariableFeatures()
VariableFeatures(seurat_processed) <- var_genes- What are the 5 least variable genes (of the top 3,000) in the Seurat object?
Assays and layers
Assays
Within a Seurat object you can have multiple “assays”. Each assay contains its own count matrix that is separate from the other assays in the object. This structure was created with multimodal datasets in mind so we can store, for example, ATAC peaks within the same Seurat object as your RNA counts.
To access the list of assays in our Seurat object, you can call @assays.
# Display available assays
seurat_processed@assays$Spatial.008um
Assay (v5) data with 18085 features for 135798 cells
Top 10 variable features:
IGHG1, IGHM, SST, IGLC1, INSL5, CHGA, CXCL8, GCG, HBA2, IGLC7
Layers:
counts, data
$sketch
Assay (v5) data with 18085 features for 10000 cells
Top 10 variable features:
IGHG1, IGHM, IGLC1, CHGA, SST, CXCL8, GCG, INSL5, VIP, PYY
Layers:
counts, data, scale.data
We can additionally see which of the assays in our dataset is set as the default with the DefaultAssays() function. This default assay is automatically used as by Seurat functions, unless you specify otherwise in the parameters of your function.
# SHow defualt assay
DefaultAssay(seurat_processed)[1] "Spatial.008um"
Here we can see that the default assay is set to Spatial.008um. If we instead wanted to see the RNA counts from the sketch assay, we can set a new default by once again calling the DefaultAssay() function and storing the name of a different assay.
# Set new default assay
DefaultAssay(seurat_processed) <- "sketch"
# Print out the new default to see if it changed
DefaultAssay(seurat_processed)[1] "sketch"
We can access each assay as if the Seurat object was a named list with double brackets:
# Access an assay that isn't the default assay with double square brackets
seurat_processed[["Spatial.008um"]]Assay (v5) data with 18085 features for 135798 cells
Top 10 variable features:
IGHG1, IGHM, SST, IGLC1, INSL5, CHGA, CXCL8, GCG, HBA2, IGLC7
Layers:
counts, data
And similarly run any function on it:
# Run a function on an accessed assay that isn't the default assay
dim(seurat_processed[["sketch"]])[1] 18085 10000
- What are the dimensions for each assay in the Seurat object?
Layers
Layers are the different counts matrices that you can access within each assay (prior to Seurat version 5, this feature was known as “slots”).
Following the standard Seurat workflow, you would have the following matrices:
- counts (raw counts matrix)
- data (normalized count matrix (generated after
SCTransform()orNormalizeData()) - scale.data (output from the
ScaleData())
We can see which layers are accessible with the Layers() function.
# Display layers
Layers(seurat_processed[["sketch"]])[1] "counts" "data" "scale.data"
Accessing full count matrix
You can grab the entire counts matrix by making use of the LayerData() function.
# Subsetting to the first 5 genes and bins for easy viewing
LayerData(seurat_processed,
assay = "Spatial.008um",
layer = "counts")[1:5, 1:5]5 x 5 sparse Matrix of class "dgCMatrix"
P5CRC_s_008um_00078_00444-1 P5CRC_s_008um_00128_00278-1
SAMD11 . .
NOC2L . .
KLHL17 . .
PLEKHN1 . .
PERM1 . .
P5CRC_s_008um_00052_00559-1 P5CRC_s_008um_00121_00413-1
SAMD11 . .
NOC2L . .
KLHL17 . .
PLEKHN1 . .
PERM1 . .
P5CRC_s_008um_00202_00633-1
SAMD11 .
NOC2L .
KLHL17 .
PLEKHN1 .
PERM1 .
- Show the code to get the entire sketch log-normalized (
data) count matrix.
Accessing specific features and metadata
The FetchData() function is useful to directly access the counts of a feature for each bin. You can also specify the layer and assay to specify which piece of information you want.
# Normalized counts for the gene PTPRC in the assay sketch
FetchData(seurat_processed,
vars = c("PTPRC", "sample"),
assay = "sketch",
layer="data") %>% head() PTPRC
P5CRC_s_008um_00205_00489-1 0
P5CRC_s_008um_00098_00552-1 0
P5CRC_s_008um_00151_00503-1 0
P5CRC_s_008um_00107_00427-1 0
P5CRC_s_008um_00065_00311-1 0
P5CRC_s_008um_00165_00349-1 0
Conveniently, you can also get information from multiple assays at the same time. To do so, you prepend the assay name (in lowercase format) for the feature you supply to the FetchData() function.
# Grab the normalized counts in the integrated and RNA assays
FetchData(seurat_processed,
vars=c("sketch_PTPRC", "Spatial.008um_PTPRC"),
layer="data") %>%
head() sketch_PTPRC
P5CRC_s_008um_00205_00489-1 0
P5CRC_s_008um_00098_00552-1 0
P5CRC_s_008um_00151_00503-1 0
P5CRC_s_008um_00107_00427-1 0
P5CRC_s_008um_00065_00311-1 0
P5CRC_s_008um_00165_00349-1 0
- Show how you would use the
FetchData()function to generate a dataframe offullumapsketch_1,fullumapsketch_2andorig.identvalues for each cell.
Accessing dimensional reductions
PCA
The scores for each PC is stored within the embeddings slot of the Seurat object. These can be accessed by using the Embeddings() function.
# Show PCA embeddings
# Alternative method of accessing PCA values
# seurat_processed[['pca']]@cell.embeddings
Embeddings(seurat_processed,
reduction = "pca.sketch")[1:5, 1:5] PC_1 PC_2 PC_3 PC_4
P5CRC_s_008um_00205_00489-1 3.709785 1.8213648 -2.3974464 0.7473072
P5CRC_s_008um_00098_00552-1 1.979896 -0.7775721 -3.1652273 -1.3673613
P5CRC_s_008um_00151_00503-1 3.792683 3.4938893 -10.1402720 31.2417728
P5CRC_s_008um_00107_00427-1 2.154885 -4.0624512 -3.5581731 -3.0002939
P5CRC_s_008um_00065_00311-1 -22.419118 6.0348684 0.7173482 0.6259992
PC_5
P5CRC_s_008um_00205_00489-1 0.7568303
P5CRC_s_008um_00098_00552-1 -0.9723445
P5CRC_s_008um_00151_00503-1 -13.3532702
P5CRC_s_008um_00107_00427-1 -1.0628408
P5CRC_s_008um_00065_00311-1 1.5016913
The weights (loadings) for each feature are also stored and can be accessed with Loadings() function.
# Show PCA loadings
# Alternative method of accessing PCA loadings
# pbmc[['pca]]@feature.loadings
Loadings(seurat_processed,
reduction = "pca.sketch")[1:5, 1:5] PC_1 PC_2 PC_3 PC_4 PC_5
IGHG1 0.008454886 0.020305947 0.02131299 -0.011029425 0.001427392
IGHM 0.033022561 0.028325733 0.03002541 -0.030765923 0.008312626
IGLC1 0.017116523 0.011749722 0.02217907 -0.010199287 0.023402614
CHGA -0.002113369 -0.017450566 -0.03846076 0.001083699 -0.006691523
SST 0.000300984 -0.001915676 -0.01129114 0.002629025 -0.002397141
We can also view more information about the top PCs, like the genes that are most strongly correlated with the first few PCs with the ProjectDim() function.
# Show top genes associated with PCs
ProjectDim(seurat_processed,
reduction = "pca.sketch")PC_ 1
Positive: IGKC, JCHAIN, A2M, CD74, IGHA1, SPARCL1, IGHM, COL3A1, SRGN, ADAMDEC1
TAGLN, CXCR4, C3, TIMP3, COL6A2, IGFBP5, CCN2, SELENOP, APOE, IL7R
Negative: LCN2, CEACAM6, CEACAM5, GPX2, ID1, DPEP1, NQO1, S100P, MDK, FERMT1
GGH, REG3A, CDC25B, IFITM3, PTP4A3, GMDS, ITGA6, HOXB9, ZNF703, SERPINA1
PC_ 2
Positive: CD74, SRGN, IFITM3, CD44, REG3A, IFITM2, SPARC, A2M, CDC25B, COL3A1
DPEP1, PTP4A3, SPARCL1, LGALS1, TGFBI, C3, CXCR4, IGFBP7, COL1A1, LAPTM5
Negative: MUC12, PIGR, SLC26A3, FABP1, SLC26A2, FXYD3, CKB, TSPAN1, GUCA2A, FCGBP
CEACAM7, CA1, CA2, KRT20, ZG16, PLA2G2A, CA4, MUC2, SGK1, TMPRSS2
PC_ 3
Positive: SELENOP, C1QA, C1QC, CD74, C1QB, APOE, SLC40A1, SRGN, LYZ, MS4A6A
CEACAM7, GUCA2A, SLC26A3, CXCL14, MS4A7, CTSB, FCER1G, TYROBP, IFI30, MPEG1
Negative: CLCA1, LEFTY1, WFDC2, KIAA1324, NXPE1, VIP, SCG2, MUC5B, SCGN, ITLN1
CALB2, PCSK1N, HMGCS2, NXPE4, WNK2, KLK1, RGMB, SPINK4, TFF3, L1CAM
PC_ 4
Positive: VIP, CALB2, SCG2, SCGN, TUBB2B, CHRNA3, UCHL1, L1CAM, RTN1, NSG2
GAL, HAND2, SYT1, PCSK1N, AKAP12, TMEM59L, STMN2, ETV1, TUBB2A, PRPH
Negative: CLCA1, CXCL8, G0S2, LEFTY1, ITLN1, WFDC2, MUC2, IL1B, MUC5B, PLA2G2A
IL1R2, BCL2A1, NXPE1, FCGBP, IL1RN, TFF3, PLEK, PIGR, AQP9, KLK1
PC_ 5
Positive: A2M, COL3A1, JCHAIN, PLVAP, IGFBP7, CXCL14, SPARCL1, COL6A2, SPARC, F3
COL1A2, PECAM1, COL4A1, ENG, COL6A1, IGKC, RGS5, COL1A1, VWF, ADAMDEC1
Negative: CXCL8, G0S2, IL1B, BCL2A1, IL1RN, S100A9, AQP9, S100A8, PLEK, PTGS2
OSM, NAMPT, PLAUR, ITGAX, IL1R2, CSF3R, PFKFB3, FCAR, SOD2, ACSL1
An object of class Seurat
36170 features across 135798 samples within 2 assays
Active assay: sketch (18085 features, 2000 variable features)
3 layers present: counts, data, scale.data
1 other assay present: Spatial.008um
4 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch
2 spatial fields of view present: P5CRC.008um P5NAT.008um
UMAP/tSNE
To access the coordinates used for UMAP/tSNE plots, we specify the reduction of interest in the Embeddings() function.
# Accessing UMAP embeddings
# Alternative method of accessing UMAP embeddings
# seurat_processed[['umap']]@cell.embeddings
Embeddings(seurat_processed,
reduction = "umap.sketch") %>%
head() umapsketch_1 umapsketch_2
P5CRC_s_008um_00205_00489-1 1.3003940 -5.6375675
P5CRC_s_008um_00098_00552-1 -0.3800069 4.2747884
P5CRC_s_008um_00151_00503-1 0.6560353 -12.7625957
P5CRC_s_008um_00107_00427-1 -0.8216718 5.9898009
P5CRC_s_008um_00065_00311-1 8.9849455 1.3358915
P5CRC_s_008um_00165_00349-1 9.7599007 -0.7390782
Data visualization
Underneath the hood, all of Seurat’s plotting functions make use of ggplot which means that we can add more details to our plots using ggplot functions.
DimPlot
The DimPlot() function allows us to visualize metadata that is categorical on different reductions (PCA, UMAP).
By default, DimPlot() will color bins by the Idents() and use UMAP as the default reduction.
# Show different clusters on the UMAP
DimPlot(seurat_processed) +
ggtitle("Seurat clusters")We can specify a different metadata column using the group.by argument
# Plot bins on UMAP different and colored by their orig.ident
DimPlot(seurat_processed,
group.by = "orig.ident")We can also use the split.by argument to create multiple plots that only show cells that have the same value for the metadata column specified.
SpatialDimPlot
Similarly, we can plot these categorical values and plot them on top of the tissue slide with SpatialDimPlot():
# Plot bins on spatial slide, colored by their orig.ident
SpatialDimPlot(seurat_processed,
group.by = "orig.ident",
pt.size.factor = 15)There are a variety of different parameters that can be specified in order to create the plot that you want to make.
FeaturePlot
The FeaturePlot() function allows us to visualize both metadata and features that are continuous on different reductions (PCA, UMAP).
# Highlight gene expression on the UMAP
FeaturePlot(seurat_processed,
features = c("PLVAP", "ENG"))We can additionally order the values in a way that bins with higher values are shown in front (to avoid other bins drowning out them).
To identify bins that show the highest expression of a feature, we can set a min.cutoff based upon quantiles, where bins below the the threshold will show no expression.
# Highlight gene expression on the UMAP with a min.cutoff
FeaturePlot(seurat_processed,
reduction = "umap.sketch",
features = c("PLVAP", "ENG"),
order = TRUE,
min.cutoff = 'q10')We can also add labels onto our UMAP to easily identify which groups of bins we are seeing the expression using the LabelClusters() function. The parameters shown here put a white background behind the text to make it easier to see the labels.
# Set idents to seurat_cluster.projected
Idents(seurat_processed) <- "seurat_cluster.projected"
# Overlap gene expression on a UMAP
p <- FeaturePlot(seurat_processed,
reduction = "umap.sketch",
features = "LCN2",
order = TRUE,
min.cutoff = 'q10')
# Label clusters
LabelClusters(p,
id = "ident",
fontface = "bold",
size = 8,
bg.colour = "white",
bg.r = .2,
force = 0)SpatialFeaturePlot
Just as we can plot genes or numeric values on top of the UMAP, we can also do the same with the tissue with SpatialFeaturePlot()
FeatureScatter
FeatureScatter() creates a scatterplot of expression values for two features with each bin being colored by their ident. Bear in mind that you can also specify a continuous metadata column and not just 2 genes/features.
# Set idents to seurat_cluster.projected
Idents(seurat_processed) <- "seurat_cluster.projected"
# Create a scatterplot using two features
FeatureScatter(seurat_processed,
feature1 = "MT-ND5",
feature2 = "mitoRatio",
pt.size = 2) +
ggtitle("MitoRatio vs MT-ND5 expression")CellScatter
To visualize the differences between two specific bins, you can use the CellScatter() function to get a scatterplot of values for each feature in both bins
# Obtain a bin name
bin1 <- Cells(seurat_processed)[2]
# Obtain a second bin name
bin2 <- Cells(seurat_processed)[7]
# Here we can see the metadata for the first two bins in the dataset
seurat_processed@meta.data %>%
subset(rownames(seurat_processed@meta.data) %in% c(bin1, bin2)) %>%
select(orig.ident, nCount_Spatial.008um) orig.ident nCount_Spatial.008um
P5CRC_s_008um_00098_00552-1 P5CRC 199
P5CRC_s_008um_00115_00346-1 P5CRC 467
VlnPlot
We can create a violin plot to compare the distribution of gene expression across different populations using the VlnPlot() function.
This is a very customizable function with many parameters to customize the look of the plots.
# Create violin plot for each cluster plotted by two genes
VlnPlot(seurat_processed,
features = c("CEACAM5", "LCN2"))In this next example, we are grouping expression by sample and showing 2 plots per column. We are also removing the points (cells) by setting their size to 0.
# Creating a multi-panel violin plot for four genes comparing the orig.ident for each
VlnPlot(seurat_processed,
features = c("CEACAM6", "CEACAM5", "KRT8", "LCN2"),
group.by="orig.ident",
ncol=2,
pt.size=0)RidgePlot
Ridge plots are most commonly used with CITE-seq or hashtagged datasets as they provide an easy way to identify cells that express a protein/antibody.
When we call RidgePlot() for our spatial dataset, on the y-axis we see the unique identities assigned for each bin and the x-axis shows us the expression level for whichever feature we chose. This is a great visualization to use when justifying annotation decisions.
DimHeatmap
To see the effect of genes on a principal component, we can visualize the top and bottom features in PC1 using the DimHeatmap() function.
# Show the impact of the most influential genes in PC1
DimHeatmap(seurat_processed,
nfeatures = 10,
reduction = "pca.sketch")By default, PC1 is visualized, but we can edit which principal component we would like to visualize with the dims argument. If we wanted PC2, the code would be:
# Show the impact of the most influential genes in PC2
DimHeatmap(seurat_processed,
nfeatures = 10,
reduction = "pca.sketch",
dims = 2)Or if we wanted to inspect principal components 1 through 3, then we could do:
DoHeatmap
To plot the expression values for genes across all bins (grouped by their identity), you can call Seurat’s DoHeatmap() function to identify which populations certain genes are lowly or highly expressed in.
DotPlot
Seurat also has a built-in visualization tool which allows us to view the average expression of genes, grouped by cluseter, with a function called DotPlot(). The size of the circle represents the number of bins that express the gene within a group and the hue represents the average expression of the feature.
If you supply a named list with labels annotating genes, those labels will appear at the top of the plot for easier visualization.
# List of known celltype markers
# Create and empty list to hold our cell types and markers
markers <- list()
# Populate the list with known markers for cell types
markers[["B cells"]] <- c("IGKC", "IGHM", "CD79A", "MS4A1", "MZB1")
markers[["Endothelial cells"]] <- c("PECAM1", "VWF", "PLVAP", "ENG", "KLF2")
markers[["Fibroblasts"]] <- c("COL1A1", "COL3A1", "DCN", "LUM", "COL6A2")
markers[["T cells"]] <- c("TRAC", "CD3E", "TRBC2", "IL7R", "CD52")
markers[["Tumor cells"]] <- c("CEACAM6", "CEACAM5", "EPCAM", "KRT8", "LCN2")
# Create dotplot based on RNA expression
DotPlot(seurat_processed,
features = markers,
assay="Spatial.008um") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))