CellChat

Spatial transcriptomics

Cell-cell communication

CellChat

Ligand-receptor analysis

In this lesson, we will analyze cell–cell communication in spatial transcriptomics using CellChat to infer signaling networks. You will build interaction graphs between celltypes and relate spatial proximity to signaling strength.

Author

Noor Sohail

Published

April 19, 2026

Keywords

Ligands, Receptors, Interaction graph, Signaling strength

Approximate time: 60 minutes

Learning objectives

In this lesson, we will:

Create a CellChat object
Run the CellChat workflow to calculate ligand-receptor interactions
Visualize ligand receptor interactions

Overview of lesson

At this point in our analysis, we are comfortable with the celltype annotations of our dataset. So we might start asking questions about how each of those populations are interacting with one another. In this lesson, we will use CellChat to build directed signaling networks. These interaction maps help connect static expression patterns to dynamic biological processes such as immune infiltration, niche signaling or tumor–stroma crosstalk.

Cell-cell communication

We know that cells do not exist in an isolated environment, where they do not interact with anything. The opposite is true, where cells interact with one another and exchange information through various mechanisms, such as signal transduction for activating proteins. Understanding which cells interact with each other is crucial for understanding how cell differentiation, organ development and metabolism is modulated.

Spatial transcriptomics is a great input dataset for these methods because cell interactions are oftentimes between cells in close proximity. The coordinates become a powerful input parameter to improve the accuracy of interaction results.

CellChat

CellChat is a method that was originally published by Jin et al., 2021, and then updated several years later Jin et al., 2025, to incorporate spatial coordinates into the model. The outputs of CellChat are lists of significant ligand-receptor interactions between cell types.

First, the authors created a database of known ligand-receptor interactions from the literature classified as: secreted signaling, extracellular matrix recepotrs and cell-cell contact. This database is used in conjunction with your own dataset and its celltype labels to identify communcations between populations. In essence, the method identifies overexpressed ligand receptors from the database in your own data to measure association using the principle of mass action. Using this assumption, if there is high expression of a ligand in a cell type and high expression of a receptor, then the probability of ligand-receptor interaction strength is high.

Running cellchat

CellChat was written such that you run the workflow on one sample at a time and then merge all datasets together to run a comparative analysis. So here, we are going to focus on the ligand-receptor results for our P5CRC dataset. As this is a new analysis, we will create a new script titled 09_cellchat.R and place it within our scripts directory:

# CellChat
# Visium HD spatial transcriptomics workshop
# Author: Harvard Chan Bioinformatics Core
# Created: May 2025

# Load libararies
library(tidyverse)
library(Seurat)
library(CellChat)
set.seed(12345)

# Increases the size of the default vector (8GB)
options(future.globals.maxSize = 8000 * 1024^2)
# Set parallelization (multithreading) to speed up calculations
library(future)
plan(multisession, workers = parallel::detectCores() - 1)

# Load dataset
seurat_rctd <- readRDS("data/intermediate_seurat/seurat_RCTD.rds")

For now, let us say that our biological question is to identify which populations are interacting with the tumor cells in colorectal cancer. So first, we are going to subset the Seurat object to our relevant bins.

# Subset to CRC sample
# Remove unassigned bins
crc <- subset(seurat_rctd,
              subset = (orig.ident == "P5CRC") &
                       (first_type != "unassigned"))

# CellChat expects a "samples" column that is a factor
crc$samples <- factor(crc$orig.ident)
crc

An object of class Seurat 
36170 features across 53017 samples within 2 assays 
Active assay: Spatial.008um (18085 features, 2000 variable features)
 2 layers present: counts, data
 1 other assay present: sketch
 6 dimensional reductions calculated: pca.sketch, umap.sketch, full.pca.sketch, full.umap.sketch, pca.banksy, harmony.banksy
 1 spatial field of view present: P5CRC.008um

Create CellChat object

Similar to RCTD, CellChat has its own data structure for storing information. In order to create this object with the createCellChat() function, we must provide the following pieces of information:

object: Normalized counts matrix
coordinates: Spatial coordinates for each bin
spatial.factors: Spatial factors (stored in json file)

To generate these, we can use the GetAssayData() to grab the log-normalized counts from the crc Seurat object:

# Get normalized counts matrices
counts_norm_crc <- GetAssayData(crc,
                                assay = "Spatial.008um",
                                layer = "data")
counts_norm_crc[1:5, 1:5]

Table 1: First 5 bins and genes of log-normalized counts in the P5CRC dataset.

	P5CRC_s_008um_00128_00278-1	P5CRC_s_008um_00052_00559-1	P5CRC_s_008um_00121_00413-1	P5CRC_s_008um_00202_00633-1	P5CRC_s_008um_00035_00460-1
SAMD11	0	0	0	0	0
NOC2L	0	0	0	0	0
KLHL17	0	0	0	0	0
PLEKHN1	0	0	0	0	0
PERM1	0	0	0	0	0

Next, we will grab the spatial coordinates associated with each bin. Here, we will use the FetchData() function to grab the coordinates we stored in the @meta.data. You could also use the ouput from the GetTissueCoordinates() function as well.

# Get spatial coordinates
coords_crc <- FetchData(crc, vars = c("x", "y"))
coords_crc %>% head()

Table 2: Spatial coordinates of the first 5 bins in the P5CRC dataset.

	x	y
P5CRC_s_008um_00128_00278-1	52418.69	60257.61
P5CRC_s_008um_00052_00559-1	60620.61	62512.22
P5CRC_s_008um_00121_00413-1	56362.64	60478.41
P5CRC_s_008um_00202_00633-1	62800.98	58138.03
P5CRC_s_008um_00035_00460-1	57725.67	62997.04
P5CRC_s_008um_00126_00342-1	54288.57	60323.76

We now need to grab the spatial.factors which is described as a dataframe with ratio and tol, where:

ratio: conversion from image units (e.g., pixels) to micrometers. For example: ratio = 0.18 means 1 pixel = 0.18 µm.
tol: distance tolerance to make center-to-center distance comparisons more robust, often set to about half the cell/spot size (in µm).

Luckily, these values are stored in the scalefactors_json.json file that Space Ranger outputs. So we will need to load in the file using jsonlite and then create a dataframe with the relevant information.

# Scale factors stored in Space Ranger json output file
sf_json <- jsonlite::fromJSON("data/P5CRC_cropped/binned_outputs/square_008um/spatial/scalefactors_json.json")

# Create spatial.factors dataframe
spatial_factors_crc <- data.frame(
  # Pixels-to-µm conversion factor
  ratio = sf_json$bin_size_um / sf_json$spot_diameter_fullres,
  # Neighbour tolerance radius in µm
  tol = sf_json$bin_size_um / 2)
spatial_factors_crc

      ratio tol
1 0.2736934   4

With all three input files needed for createCellChat() generated, we can start the CellChat workflow. We will additionally supply our metadata as well as the column which contains our cell type annotations (first_type).

# Create cellchat object
cellchat_crc <- createCellChat(
  object = counts_norm_crc,
  coordinates = coords_crc,
  spatial.factors = spatial_factors_crc,
  meta = crc@meta.data,
  group.by = "first_type",
  datatype = "spatial")

[1] "Create a CellChat object from a data matrix"
Create a CellChat object from spatial transcriptomics data... 
Set cell identities for the new CellChat object 
The cell groups used for CellChat analysis are  B cells, Endothelial, Fibroblast, Intestinal Epithelial, Myeloid, Neuronal, Smooth Muscle, T cells, Tumor

# Display the CellChat object
cellchat_crc

An object of class CellChat created from a single dataset 
 18085 genes.
 53017 cells. 
CellChat analysis of spatial data! The input spatial locations are 
                              x_cent   y_cent
P5CRC_s_008um_00128_00278-1 52418.69 60257.61
P5CRC_s_008um_00052_00559-1 60620.61 62512.22
P5CRC_s_008um_00121_00413-1 56362.64 60478.41
P5CRC_s_008um_00202_00633-1 62800.98 58138.03
P5CRC_s_008um_00035_00460-1 57725.67 62997.04
P5CRC_s_008um_00126_00342-1 54288.57 60323.76

CellChat databset

CellChatDB is the database that contains the manually curated set of ligand-receptor interactions created by the authors. It includes the following types of interactions:

Secretory autocrine/paracrine signaling interactions
Cell-cell contact interactions
Extracellular matrix (ECM)-receptor interactions
Non-protein signaling

We want to incorporate this information into our cellchat_crc so that when we run CellChat, the workflow will automatically know which interactions to include in the analysis. This database is available for both mouse and human. Since we are working with human samples, we will use CellChatDB.human and we assign this to our CellChat object.

# Load CellChat databases
cellchat_db <- CellChatDB.human

# Assign DB to cellchat object in the DB slot
cellchat_crc@DB <- cellchat_db

Subset CellChatDB

The database contains ligand-receptors that can be broadly categorized in 4 groups:

# Pie chart of cellchat database categories
showDatabaseCategory(cellchat_db)

If you are only interested in a particular group of interactions, say from one on the signaling groups, you can use the subsetDB() function and specificy which group you want in the search argument.

Here we show what the full list of categories are:

# DO NOT RUN
# Subset CellChatDB for cell-cell communication analysis
cellchat_db <- subsetDB(cellchat_db, 
                            search = c("Secreted Signaling",
                                       "ECM-Receptor", 
                                       "Cell-Cell Contact",
                                       "Non-protein Signaling"))

Now that the database is linked to our dataset, we need to run the subsetData() function to filter the dataset to the signaling genes to save on computation.

# Subset step is required even if you have not filtered databset
cellchat_crc <- subsetData(cellchat_crc)

Next, we identify which genes are overexpressed in each cell type with identifyOverExpressedGenes(). This is very similar to how we ran FindMarkers() in a previous lesson. These scores are calculated with a Wilcoxon and auROC test and stored in the @var.features slot of the CellChat object.

To preview what is happening behind the scenes, we can take a look at the top genes per clusters, ranked by the adjusted p-value and log-fold change.

# Identify genes that are over-expressed relative to other cell types
cellchat_crc <- identifyOverExpressedGenes(cellchat_crc)

# Print top genes per cell type
cellchat_crc@var.features$features.info %>%
  group_by(clusters) %>%
  arrange(pvalues.adj, desc(logFC)) %>%
  slice_head(n = 2) %>% 
  View()

Table 3: Top over expressed genes in each cell type.

features	clusters	avgExpr	logFC	auc	pct.1	pct.2
CD74	B cells	1.8390255	1.0436570	0.6486430	52.79350	21.9941169
TNFRSF17	B cells	0.5495212	0.5192934	0.5867722	18.36082	0.9927687
SPARC	Endothelial	1.8666925	1.5865943	0.7090854	48.50551	8.5658384
PECAM1	Endothelial	1.6384759	1.5197356	0.7110359	45.14945	3.9053023
COL3A1	Fibroblast	2.6453188	2.0767597	0.7557536	62.01908	16.6914483
DCN	Fibroblast	1.6133026	1.4017946	0.6899914	43.30563	6.5664684
LEFTY1	Intestinal Epithelial	1.1103757	1.0371003	0.6546731	33.15010	2.5830258
KLK1	Intestinal Epithelial	0.7352833	0.6340305	0.5972924	22.94455	4.0375452
CD74	Myeloid	3.5434007	2.8116648	0.8445060	78.47120	21.4414916
APOE	Myeloid	1.3531110	1.1933331	0.6613103	36.52142	4.9673021
NRXN1	Neuronal	1.3036131	1.2858462	0.6707690	34.61538	0.5580972
L1CAM	Neuronal	1.1765092	1.1671406	0.6628998	32.84024	0.2847434
GREM1	Smooth Muscle	0.9760079	0.9561076	0.6140751	23.29945	0.5450539
TNC	Smooth Muscle	0.6860894	0.6503496	0.5775576	16.49726	1.1384282
CD74	T cells	3.0282518	2.2161893	0.7866566	72.98547	22.9248782
CXCR4	T cells	1.9388902	1.7783412	0.7409209	51.91546	4.9181601
CEACAM6	Tumor	2.5108320	2.0816297	0.8294907	79.36212	13.3708481
CEACAM5	Tumor	3.2146307	1.7904014	0.7567259	89.56826	39.7664459

Some of these genes may look familiar!

Run CellChat and filter results

We will not be running the following three steps during class due to the time it takes to compute. We will provide an RDS object with the output in the appropriate step of the workflow.

Similar to how we identified over-expressed genes per cell type, we can also calculate over-expressed interactions, or ligand-receptor pairs.

# DO NOT RUN
# Identify interactions that are over-expressed relative to other cell types
cellchat_crc <- identifyOverExpressedInteractions(cellchat_crc)

This next step of smoothData() (or projectData() in older version of CellChat) is optional. Here, we use the protein-protein network (PPI.human) to identify genes that can be smoothed based upon the neighborhood structure of the data. The authors note that “This function is useful when analyzing single-cell data with shallow sequencing depth because the projection reduces the dropout effects of signaling genes, in particular for possible zero expression of subunits of ligands/receptors”

# DO NOT RUN
# (Optional) 
# Smooth expression over the protein-protein interaction network to reduce dropout noise. 
# If you run this, set raw.use = FALSE in computeCommunProb()
cellchat_crc <- smoothData(cellchat_crc,
                           adj = PPI.human)
# For older versions of CellChat, the function is: 
# cellchat <- projectData(cellchat, PPI.human)

The last computationally heavy step of this workflow is computeCommunProb() function, which calculates the interaction strengths and probabilities of cell types. The steps are:

Summarize gene expression by calculating the robust mean gene expression per cell group (triMean)
Identify ligand-receptor (LR) compounds (from the database) and assign values based upon average gene expression per cell group (calculated in previous step)
Create spatial proximity matrix between cell groups to weight source-target cell groups based upon nearness
Compute communication strength for LR and cell type pairs, incorporating spatial weights
Run permutation test for significance testing

Author note on tri-mean

In the documentation for CellChat, the author defines tri-mean like so:

Of note, ‘trimean’ approximates 25% truncated mean, implying that the average gene expression is zero if the percent of expressed cells in one group is less than 25%. To use 10% truncated mean, USER can set type = "truncatedMean" and trim = 0.1. The function computeAveExpr can help to check the average expression of signaling genes of interest, e.g, computeAveExpr(cellchat, features = c("CXCL12","CXCR4"), type = "truncatedMean", trim = 0.1).

Table 4: Parameters for computeCommunProb() function.

Argument	Description
object	CellChat object.
type	Method for average expression; "triMean" (default) or "truncatedMean" (needs trim).
raw.use	Use raw (TRUE) or if smoothData() was run (FALSE)
distance.use	Apply spatial distance constraints (TRUE/FALSE.
interaction.range	Maximum ligand interaction distance (µm)
scale.distance	Scaling factor for spatial distances (e.g., 1, 0.2)
contact.dependent	Whether signaling is contact-dependent (TRUE/FALSE)
contact.range	Maximum distance (µm) for cells to be considered in contact

Of particular note is the scale.distance parameter. Remember how earlier we established what the ratio was to establish the pixel distance for 1µm? This is where that information comes into play, where we tell computeCommunProb() that if a cell group is not within scale.distanceµm of each other, then they are unlikely to interact with one another.

With all the previously calculated values, we can now run the computeCommunProb() function:

# DO NOT RUN
# Calculate community probability
cellchat_crc <- computeCommunProb(
  cellchat_crc,
  type = "trimean",
  distance.use = TRUE,
  interaction.range = 200,
  scale.distance = 0.2,
  contact.dependent = TRUE,
  contact.range = 10,
  raw.use = FALSE,
  seed.use = 12345)

This creates a new slot in the CellChat object called, @net, where the interaction strengths are stored.

Load RDS object

Those three steps take a long time to run (~25 minutes), so we have created an RDS object for you to load in with the results from these steps:

# Load processed cellchat object
cellchat_crc <- readRDS("data/intermediate_seurat/cellchat_crc_computeCommunProb.rds")

We can now filter results based first on the how many bins belong to each cell type:

# Removes celltype pairs with too few bins
cellchat_crc <- filterCommunication(cellchat_crc,
                                    min.cells = 5)

Each ligand-receptor is associated with a pathway, which comes from the database that was curated from the scientific literature. So to summarize the results, we can aggregate the values to create a pathway-level score to identify more general trends in the data.

# Summarize Ligand-Receptor pair probabilities into signalling pathway probabilities
cellchat_crc <- computeCommunProbPathway(cellchat_crc)

# Count interactions and sum weights into a flat network matrix
cellchat_crc <- aggregateNet(cellchat_crc)

# Compute how much each celltype sends vs receives across all pathways
cellchat_crc <- netAnalysis_computeCentrality(cellchat_crc, slot.name = "netP")

Visualize CellChat results

With this information, we can begin visualizing how the bins in P5CRC are interacting with one another. The CellChat package comes with a variety of different built-in ways to visualize the results which we will now explore.

Strength and number of interactions

The most popular representation is the netVisual_circle() plot, which shows how the celltypes are interacting with one another. This is accomplished by setting nodes of each celltype on the boundary of a circle, with lines from source to target showcasing either the number or strength of interactions.

# 2 plots, side-by-side
par(mfrow = c(1, 2), xpd = TRUE) 

# Number of LR interactions between celltypes
netVisual_circle(
  cellchat_crc@net$count,
  weight.scale = TRUE,
  label.edge = FALSE,
  edge.width.max = 12,
  title.name = "Number of interactions")

# Strength of LR interactions
netVisual_circle(
  cellchat_crc@net$weight,
  weight.scale = TRUE,
  label.edge = FALSE,
  edge.width.max = 12,
  title.name = "Interaction strength")

# Reset par to not affect later plots
par(mfrow = c(1, 1))

Figure 3: Circular plot, showcasing the strength and number of ligand-receptor interactions between cell types.

Notice that the strongest interactions are circular between tumor cells. This tells us that tumor cells are frequently and strongly sending signals to one another. We can also see that fibroblasts appear to be sending signals to tumor cells. Rodriguez et al., 2025, found that cancer-associate fibroblasts “induce pro-invasive gene expression changes involved in epithelial-mesenchymal transition, extracellular matrix remodeling and migration.”

Zoom in on tumor and fibroblast bins

We can also zoom in on the tumor and fibroblast bins to identify where they exist spatially.

# Color bins by tissue after subsetting to tumor and fibroblasts
SpatialDimPlot(crc %>% subset(first_type %in% c("Tumor", "Fibroblast")),
                group.by = "first_type",
                pt.size.factor = 13,
                image.alpha = 0.2)

As we might hope to see, there are fibroblasts that are located in close proximity to tumor bins.

An alternative way to represent the circular plots can be done with a heatmap. On the y-axis are the “sources” and the x-axis are the “receivers”. The intensity of the heatmap is proportional to the number/strength of interactions between the source and receiver cell types. The top colored bar plot represents the sum of column of values displayed in the heatmap. The right colored bar plot represents the sum of row of values.

# Number of LR interactions between celltypes heatmap
gg_h_count  <- netVisual_heatmap(
  cellchat_crc, 
  measure = "count",
  color.heatmap = "Blues",
  title.name = "Number of interactions")

# Stregnth of LR interactions between celltypes heatmap
gg_h_weight <- netVisual_heatmap(
  cellchat_crc, 
  measure = "weight",
  color.heatmap = "Reds",
  title.name = "Interaction strength")

# Plot heatmaps side by side
gg_h_count + gg_h_weight

Figure 5: Heatmap of interaction numbers and strengths between cell types.

To summarize the amount of incoming and outgoing signal, we can also generate a scatterplot. Each point represents a unique cell types, where the size is the total number of incoming and outgoing ligand-receptor interactions for that celltype. The x-axis represents the total outgoing communication probability and the y-axis, similarly, represents the total incoming communication probability.

# Scatterplot of in/out signaling for each cell type
netAnalysis_signalingRole_scatter(cellchat_crc)

Figure 6: Scatterplot of incoming and outgoing signaling for each cell type.

Pathway-level visuals

The last steps we ran in the CellChat workflow was to calculate statistics for each pathway, which is comprised of various ligand-receptors. So now we can take a look at the top significant pathways associated within our dataset:

# Top significant pathways
rankNet(cellchat_crc, 
        measure = "weight",
        mode = "single", 
        stacked = TRUE, 
        do.stat = TRUE) +
  xlab("Pathways")

Figure 7: Top ranked ligand-receptor pathways.

For each of these pathways, we have associated signaling strength for each celltype. We can evaluate the outgoing and incoming pathways signal strength for each celltype.

Plotting errors and warning

“Error: not enough space for cells at track index ‘1’.”

This error comes from when the size of the plot is too small to display the plot. Hit the “Zoom” button on “Plots” tab of RStudio and enlarge the image to view the image.

# Heatmap of outgoing pathway signaling strength
ht_out <- netAnalysis_signalingRole_heatmap(
  cellchat_crc, 
  pattern = "outgoing",
  width = 10,
  height = 20,
  color.heatmap = "YlOrRd",
  title = "Outgoing signaling strength")

# Heatmap of incoming pathway signaling strength
ht_in  <- netAnalysis_signalingRole_heatmap(
  cellchat_crc, 
  pattern = "incoming",
  width = 10,
  height = 15,
  color.heatmap = "GnBu",
  title = "Incoming signaling strength")

# Plot heatmaps side by side
ht_out + ht_in

Figure 8: Outgoing and incoming pathways signal strength for each celltype

If we wanted to zoom in on one particular pathway, in this case CEACAM, we can use the netVisual_aggregate() and specify the signaling pathway. For a circular plot with lines directed between, we supply the layout argument as "chord".

Similar to the first visual we made, this is a circular plot that represents how much interaction is occuring between cell types for a single pathway.

# netVisual_aggregate on the whole pathway to show chrods
netVisual_aggregate(cellchat_crc,
                    signaling  = "CEACAM",
                    layout = "chord",
                    title.name = "CEACAM signaling")

Figure 9: CEACAM pathways interactions between celltypes.

The CEACAM genes/pathway is notably increased in epithelial tumor cells in colon cancer. These results we are seeing is consistent with that assumption, where we see many interactions within the CEACAM pathway to be with tumor bins.

Spatial netVisual_aggregate()

There is spatial layout that we can also supply to netVisual_aggregate(). However, since our dataset does not have cell types localize to particular regions of the tissue, this visual is not as informative.

# netVisual_aggregate on the whole pathway to show arrows of interactions
# Plotted on the spatial tissue slide
netVisual_aggregate(
  cellchat_crc,
  signaling = "CEACAM",
  layout = "spatial",
  point.size = 0.05,
  alpha.image = 0.1,
  alpha = 0.6,
  vertex.size.max = 3,
  edge.width.max = 5,
  vertex.label.cex = 3)

Figure 10: Spatial plot of CEACAM pathway interaction between celltypes on the tissue.

Other, more structured spatially datasets, can make use of this type of figure to show where exactly interactions are occuring, seen here:

Figure 11: Spatial images of PSAP and SPP1 signaling pathway network, the thickness of lines represents the strength of signals.
*Image source: Dong et al., 2025*

Bubble plot

Within CellChat, we can also zoom in on specific ligand-receptor pairs that belong to pathways we have identified. For example, let us take the top 5 pathways and use them as input for netVisual_bubble():

# Top 5 significant pathways
top_pathways <- cellchat_crc@netP$pathways[1:5]
top_pathways

[1] "CEACAM"   "APP"      "COLLAGEN" "MK"       "CXCL"

In particular, we are interested in the celltypes that are interacting with tumor cells. So we will look at all ligand-receptor complexes in those top pathways that have tumor cells as the target. This is represented as a bubbleplot, where the size of the dot is proportion to the p-value. If a dot does not appear for, this indicates that the interaction is not significant. The color represents the probability of interaction for the celltype-directed interaction, as labeled on the x-axis.

# Number of unique cell types
n_ct <- seurat_rctd@meta.data %>%
  pull(first_type) %>%
  unique() %>% length()

netVisual_bubble(cellchat_crc,
                 signaling = top_pathways,
                 sort.by.source.priority = TRUE,
                 sources.use = 1:n_ct,
                 targets.use = "Tumor",
                 remove.isolate = TRUE,
                 color.heatmap = "viridis",
                 angle.x = 90)

Figure 12: Bubbleplot of significant pathways with tumor cells as source.

In our case, the dots all appear to be the same size. This tells us that the pathways we selected earlier, when a ligand-receptor is significant, it is found to have a very low p-value (very significant).

Zoom in on specific ligand-receptor complexes

Accessing the table of ligand-receptor results

If you wanted to access the full dataframe of results, we can use the subsetCommuncation() function:

df_lr <- subsetCommunication(cellchat_crc) %>%
  arrange(desc(prob))
df_lr %>% head(5)

Table 5: Dataframe of ligand-receptor results, sorted by probability.

source	target	ligand	receptor	prob	interaction_name	interaction_name_2	pathway_name	annotation	evidence
Tumor	Tumor	CEACAM5	CEACAM6	0.0869136	CEACAM5_CEACAM6	CEACAM5 - CEACAM6	CEACAM	Cell-Cell Contact	uniprot
Tumor	Tumor	CEACAM6	CEACAM6	0.0750469	CEACAM6_CEACAM6	CEACAM6 - CEACAM6	CEACAM	Cell-Cell Contact	PMID:11590190
T cells	T cells	CXCL12	CXCR4	0.0108678	CXCL12_CXCR4	CXCL12 - CXCR4	CXCL	Secreted Signaling	KEGG: hsa04060
Tumor	Tumor	MDK	SDC4	0.0093137	MDK_SDC4	MDK - SDC4	MK	Secreted Signaling	PMID: 28356350
Endothelial	Endothelial	PECAM1	PECAM1	0.0087450	PECAM1_PECAM1	PECAM1 - PECAM1	PECAM1	Cell-Cell Contact	KEGG: hsa04514

From the previous visual, perhaps we were interested in the specific ligand, COL3A1, and receptor, SDC4, interaction between fibroblasts and tumors. We can start by taking a look at the SpatialFeaturePlot() to get a better sense of where these genes localize on our slide:

ligand <- "COL3A1"
receptor <- "SDC4"

SpatialFeaturePlot(crc, 
                   features = c(ligand, receptor),
                   pt.size.factor = 15,
                   image.alpha = 0.2)

Figure 13: SpatialFeaturePlot showing the expression of a ligand (COL3A1) and a receptor (SDC4) on our CRC slide.

There is a population of fibroblasts in the tumor cells that appears to correspond with the bins with high COL3A1 expression. Most likely, it is these nearby fibroblasts that are interacting with tumor cells. To confirm that there is higher expression of COL3A1 in fibroblasts, we can use a VlnPlot():

VlnPlot(crc, 
        features = c(ligand, receptor), 
        group.by = "first_type",
        pt.size = 0,
        ncol = 1)

Figure 14: Distribution of ligand-receptor pair, COL3A1-SDC4, which has significant communication between fibroblasts and tumor cells.

Notably, there is also high expression of SDC4 in the tumor cells. This is the underlying principle of mass action utilized by CellChat. Essentially, if there is high expression of the ligand (COL3A1) in the source celltype (fibroblasts) and high expression of the receptor (SDC4) in the target celltype (tumor cells), then there will be a high probability communication between the two, especially, if they are localized to the same area of the tissue.

Session information

Now that we have completed our analysis, it would be a good time record the packages and versions of the tools we used in the analysis. The sink() function allows us to save the output from a function, in this case the sessionInfo() function. This improves the reproducibility of our code for ourselves and others. We can do this like:

# Create and save a text file with sessionInfo
sink("sessionInfo_spatial_transcriptomics.txt")
sessionInfo()
sink()

Back to Schedule

Reuse

CC-BY-4.0

--- title: "CellChat" description: | In this lesson, we will analyze cell–cell communication in spatial transcriptomics using CellChat to infer signaling networks. You will build interaction graphs between celltypes and relate spatial proximity to signaling strength. author: - Noor Sohail date: "2026-04-19" categories: - Spatial transcriptomics - Cell-cell communication - CellChat - Ligand-receptor analysis keywords: - Ligands - Receptors - Interaction graph - Signaling strength license: "CC-BY-4.0" editor_options: markdown: wrap: 72 --- ```{r} #| label: load_libraries_data #| echo: false # Load libraries and data library(Seurat) library(tidyverse) library(CellChat) # devtools::install_github("KlugerLab/ALRA") # remotes::install_github('JEFworks-Lab/MERINGUE', build_vignettes = TRUE) # devtools::install_github("jinworks/SpatialCellChat") # library(SpatialCellChat) set.seed(12345) # Set parallelization (multithreading) to speed up calculations plan("multicore", workers = parallel::detectCores() - 1) # Increases the size of the default vector (8GB) options(future.globals.maxSize = 8000 * 1024^2) seurat_rctd <- qs2::qs_read("intermediate/12_seurat_RCTD.qs") ``` Approximate time: 60 minutes ## Learning objectives In this lesson, we will: - Create a CellChat object - Run the CellChat workflow to calculate ligand-receptor interactions - Visualize ligand receptor interactions ## Overview of lesson At this point in our analysis, we are comfortable with the celltype annotations of our dataset. So we might start asking questions about how each of those populations are interacting with one another. In this lesson, we will use CellChat to build directed signaling networks. These interaction maps help connect static expression patterns to dynamic biological processes such as immune infiltration, niche signaling or tumor–stroma crosstalk. ## Cell-cell communication ::: {#fig-ccc_schematic .fig} ![](../img/ccc_schematic.png){width=500} Schematic of representative signal pathways for cell-cell communication.<br> _Image source: [Su et al., 2024](https://www.nature.com/articles/s41392-024-01888-z)_ ::: We know that cells do not exist in an isolated environment, where they do not interact with anything. The opposite is true, where cells interact with one another and exchange information through various mechanisms, such as signal transduction for activating proteins. Understanding which cells interact with each other is crucial for understanding how cell differentiation, organ development and metabolism is modulated. Spatial transcriptomics is a great input dataset for these methods because **cell interactions are oftentimes between cells in close proximity**. The coordinates become a powerful input parameter to improve the accuracy of interaction results. ### CellChat CellChat is a method that was originally published by [Jin et al., 2021](https://www.nature.com/articles/s41467-021-21246-9), and then updated several years later [Jin et al., 2025](https://www.nature.com/articles/s41596-024-01045-4), to incorporate spatial coordinates into the model. The outputs of CellChat are lists of significant ligand-receptor interactions between cell types. ::: {#fig-cellchat_schematic .fig} ![](../img/cellchat_schematic.png){width=500} Overview of CellChat method.<br> _Image source: [Hao et al., 2021](https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2021.751158/full)_ ::: First, the authors created a database of known ligand-receptor interactions from the literature classified as: secreted signaling, extracellular matrix recepotrs and cell-cell contact. This database is used in conjunction with your own dataset and its celltype labels to identify communcations between populations. In essence, the method identifies overexpressed ligand receptors from the database in your own data to measure association using the principle of mass action. Using this assumption, if there is high expression of a ligand in a cell type and high expression of a receptor, then the probability of ligand-receptor interaction strength is high. ## Running cellchat CellChat was written such that you run the workflow on one sample at a time and then merge all datasets together to run a comparative analysis. So here, we are going to focus on the ligand-receptor results for our `P5CRC` dataset. As this is a new analysis, we will create a new script titled `09_cellchat.R` and place it within our `scripts` directory: ```{r} #| label: header_libraries #| eval: false # CellChat # Visium HD spatial transcriptomics workshop # Author: Harvard Chan Bioinformatics Core # Created: May 2025 # Load libararies library(tidyverse) library(Seurat) library(CellChat) set.seed(12345) # Increases the size of the default vector (8GB) options(future.globals.maxSize = 8000 * 1024^2) # Set parallelization (multithreading) to speed up calculations library(future) plan(multisession, workers = parallel::detectCores() - 1) # Load dataset seurat_rctd <- readRDS("data/intermediate_seurat/seurat_RCTD.rds") ``` For now, let us say that our biological question is to identify which populations are interacting with the tumor cells in colorectal cancer. So first, we are going to subset the Seurat object to our relevant bins. ```{r} #| label: subset_crc # Subset to CRC sample # Remove unassigned bins crc <- subset(seurat_rctd, subset = (orig.ident == "P5CRC") & (first_type != "unassigned")) # CellChat expects a "samples" column that is a factor crc$samples <- factor(crc$orig.ident) crc ``` ### Create CellChat object Similar to RCTD, CellChat has its own data structure for storing information. In order to create this object with the `createCellChat()` function, we must provide the following pieces of information: - `object`: Normalized counts matrix - `coordinates`: Spatial coordinates for each bin - `spatial.factors`: Spatial factors (stored in json file) To generate these, we can use the `GetAssayData()` to grab the log-normalized counts from the `crc` Seurat object: ```{r} #| label: get_assay #| eval: false # Get normalized counts matrices counts_norm_crc <- GetAssayData(crc, assay = "Spatial.008um", layer = "data") counts_norm_crc[1:5, 1:5] ``` ```{r} #| label: tbl-get_assay #| tbl-cap: "First 5 bins and genes of log-normalized counts in the P5CRC dataset." #| echo: false # Get normalized counts matrices counts_norm_crc <- GetAssayData(crc, assay = "Spatial.008um", layer = "data") counts_norm_crc[1:5, 1:5] %>% knitr::kable() ``` Next, we will grab the spatial coordinates associated with each bin. Here, we will use the `FetchData()` function to grab the coordinates we stored in the `@meta.data`. You could also use the ouput from the `GetTissueCoordinates()` function as well. ```{r} #| label: get_coords #| eval: false # Get spatial coordinates coords_crc <- FetchData(crc, vars = c("x", "y")) coords_crc %>% head() ``` ```{r} #| label: tbl-get_coords #| tbl-cap: "Spatial coordinates of the first 5 bins in the P5CRC dataset." #| echo: false # Get spatial coordinates coords_crc <- FetchData(crc, vars = c("x", "y")) coords_crc %>% head() %>% knitr::kable() ``` We now need to grab the `spatial.factors` which is described as a dataframe with `ratio` and `tol`, where: - `ratio`: conversion from image units (e.g., pixels) to micrometers. For example: `ratio = 0.18` means `1 pixel = 0.18 µm`. - `tol`: distance tolerance to make center-to-center distance comparisons more robust, often set to about _half_ the cell/spot size (in µm). Luckily, these values are stored in the `scalefactors_json.json` file that Space Ranger outputs. So we will need to load in the file using `jsonlite` and then create a dataframe with the relevant information. ```{r} #| label: spatial_factors # Scale factors stored in Space Ranger json output file sf_json <- jsonlite::fromJSON("data/P5CRC_cropped/binned_outputs/square_008um/spatial/scalefactors_json.json") # Create spatial.factors dataframe spatial_factors_crc <- data.frame( # Pixels-to-µm conversion factor ratio = sf_json$bin_size_um / sf_json$spot_diameter_fullres, # Neighbour tolerance radius in µm tol = sf_json$bin_size_um / 2) spatial_factors_crc ``` With all three input files needed for `createCellChat()` generated, we can start the CellChat workflow. We will additionally supply our metadata as well as the column which contains our cell type annotations (`first_type`). ```{r} #| label: create_cellchat # Create cellchat object cellchat_crc <- createCellChat( object = counts_norm_crc, coordinates = coords_crc, spatial.factors = spatial_factors_crc, meta = crc@meta.data, group.by = "first_type", datatype = "spatial") # Display the CellChat object cellchat_crc ``` ### CellChat databset CellChatDB is the database that contains the manually curated set of ligand-receptor interactions created by the authors. It includes the following types of interactions: - Secretory autocrine/paracrine signaling interactions - Cell-cell contact interactions - Extracellular matrix (ECM)-receptor interactions - Non-protein signaling We want to incorporate this information into our `cellchat_crc` so that when we run CellChat, the workflow will automatically know which interactions to include in the analysis. This database is available for both mouse and human. Since we are working with human samples, we will use `CellChatDB.human` and we assign this to our CellChat object. ```{r} #| label: load_cellchat_db # Load CellChat databases cellchat_db <- CellChatDB.human # Assign DB to cellchat object in the DB slot cellchat_crc@DB <- cellchat_db ``` ::: callout-note # Subset CellChatDB The database contains ligand-receptors that can be broadly categorized in 4 groups: ```{r} #| label: pie_cellchat_db # Pie chart of cellchat database categories showDatabaseCategory(cellchat_db) ``` If you are only interested in a particular group of interactions, say from one on the signaling groups, you can use the `subsetDB()` function and specificy which group you want in the `search` argument. Here we show what the full list of categories are: ```{r} #| label: subset_DB #| eval: false # DO NOT RUN # Subset CellChatDB for cell-cell communication analysis cellchat_db <- subsetDB(cellchat_db, search = c("Secreted Signaling", "ECM-Receptor", "Cell-Cell Contact", "Non-protein Signaling")) ``` ::: Now that the database is linked to our dataset, we **need** to run the `subsetData()` function to filter the dataset to the signaling genes to save on computation. ```{r} #| label: subsetdata_cellchat # Subset step is required even if you have not filtered databset cellchat_crc <- subsetData(cellchat_crc) ``` Next, we identify which genes are overexpressed in each cell type with `identifyOverExpressedGenes()`. This is very similar to how we ran `FindMarkers()` in a [previous lesson](13_dge_pathway.qmd). These scores are calculated with a [Wilcoxon and auROC test](https://rdrr.io/github/immunogenomics/presto/man/wilcoxauc.html) and stored in the `@var.features` slot of the CellChat object. To preview what is happening behind the scenes, we can take a look at the top genes per clusters, ranked by the adjusted p-value and log-fold change. ```{r} #| label: identify_overexpressed_genes #| eval: false # Identify genes that are over-expressed relative to other cell types cellchat_crc <- identifyOverExpressedGenes(cellchat_crc) # Print top genes per cell type cellchat_crc@var.features$features.info %>% group_by(clusters) %>% arrange(pvalues.adj, desc(logFC)) %>% slice_head(n = 2) %>% View() ``` ```{r} #| label: tbl-identify_overexpressed_genes #| tbl-cap: "Top over expressed genes in each cell type." #| echo: false # Identify genes that are over-expressed relative to other cell types cellchat_crc <- identifyOverExpressedGenes(cellchat_crc) # Print top genes per cell type cellchat_crc@var.features$features.info %>% group_by(clusters) %>% arrange(pvalues.adj, desc(logFC)) %>% slice_head(n = 2) %>% knitr::kable() ``` Some of these genes may look familiar! ### Run CellChat and filter results **We will not be running the following three steps during class due to the time it takes to compute.** We will provide an RDS object with the output in the appropriate step of the workflow. Similar to how we identified over-expressed genes per cell type, we can also calculate over-expressed **interactions**, or ligand-receptor pairs. ```{r} #| label: identifyOverExpressedInteractions #| eval: false # DO NOT RUN # Identify interactions that are over-expressed relative to other cell types cellchat_crc <- identifyOverExpressedInteractions(cellchat_crc) ``` This next step of `smoothData()` (or `projectData()` in older version of CellChat) is **optional**. Here, we use the protein-protein network (`PPI.human`) to identify genes that can be smoothed based upon the neighborhood structure of the data. The [authors note](https://github.com/jinworks/CellChat/blob/main/R/utilities.R#L817) that "This function is useful when analyzing single-cell data with **shallow sequencing depth** because the projection reduces the dropout effects of signaling genes, in particular for possible zero expression of subunits of ligands/receptors" ```{r} #| label: smoothdata #| eval: false # DO NOT RUN # (Optional) # Smooth expression over the protein-protein interaction network to reduce dropout noise. # If you run this, set raw.use = FALSE in computeCommunProb() cellchat_crc <- smoothData(cellchat_crc, adj = PPI.human) # For older versions of CellChat, the function is: # cellchat <- projectData(cellchat, PPI.human) ``` The last computationally heavy step of this workflow is `computeCommunProb()` function, which calculates the interaction strengths and probabilities of cell types. The steps are: 1. Summarize gene expression by calculating the robust mean gene expression per cell group (`triMean`) 2. Identify ligand-receptor (LR) compounds (from the database) and assign values based upon average gene expression per cell group (calculated in previous step) 3. Create spatial proximity matrix between cell groups to weight source-target cell groups based upon nearness 4. Compute communication strength for LR and cell type pairs, incorporating spatial weights 5. Run permutation test for significance testing ::: {.collapse-note collapse="true"} # Author note on tri-mean In the [documentation](https://github.com/sqjin/CellChat/blob/master/tutorial/CellChat-vignette.Rmd) for CellChat, the author defines tri-mean like so: > Of note, 'trimean' approximates 25% truncated mean, implying that the average gene expression is zero if the percent of expressed cells in one group is less than 25%. To use 10% truncated mean, USER can set `type = "truncatedMean"` and `trim = 0.1`. The function `computeAveExpr` can help to check the average expression of signaling genes of interest, e.g, `computeAveExpr(cellchat, features = c("CXCL12","CXCR4"), type = "truncatedMean", trim = 0.1)`. ::: ```{r} #| label: tbl-computeCommunProb_params #| tbl-cap: "Parameters for `computeCommunProb()` function." #| echo: false library(knitr) library(kableExtra) args <- data.frame( Argument = c("object", "type", "raw.use", "distance.use", "interaction.range", "scale.distance", "contact.dependent", "contact.range"), Description = c( "CellChat object.", 'Method for average expression; "triMean" (default) or "truncatedMean" (needs trim).', "Use raw (TRUE) or if smoothData() was run (FALSE)", "Apply spatial distance constraints (TRUE/FALSE.", "Maximum ligand interaction distance (µm)", "Scaling factor for spatial distances (e.g., 1, 0.2)", "Whether signaling is contact-dependent (TRUE/FALSE)", "Maximum distance (µm) for cells to be considered in contact" ) ) kable(args, col.names = c("Argument", "Description")) |> kable_styling(full_width = FALSE) |> column_spec(1, width = "20%") |> column_spec(2, width = "80%") ``` Of particular note is the `scale.distance` parameter. Remember how earlier we established what the `ratio` was to establish the pixel distance for 1µm? This is where that information comes into play, where we tell `computeCommunProb()` that if a cell group is not within `scale.distance`µm of each other, then they are unlikely to interact with one another. With all the previously calculated values, we can now run the `computeCommunProb()` function: ```{r} #| label: computeCommunProb #| eval: false # DO NOT RUN # Calculate community probability cellchat_crc <- computeCommunProb( cellchat_crc, type = "trimean", distance.use = TRUE, interaction.range = 200, scale.distance = 0.2, contact.dependent = TRUE, contact.range = 10, raw.use = FALSE, seed.use = 12345) ``` ```{r} #| label: instructor_15_cellchat_crc_computeCommunProb #| echo: false # qs2::qs_save(cellchat_crc, "intermediate/15_cellchat_crc_computeCommunProb.qs") # saveRDS(cellchat_crc, "data/intermediate_seurat/cellchat_crc_computeCommunProb.rds") cellchat_crc <- qs2::qs_read("intermediate/15_cellchat_crc_computeCommunProb.qs") ``` This creates a new slot in the CellChat object called, `@net`, where the interaction strengths are stored. ::: callout-important # Load RDS object Those three steps take a long time to run (~25 minutes), so we have created an RDS object for you to load in with the results from these steps: ```{r} #| label: load_cellchat_rds #| eval: false # Load processed cellchat object cellchat_crc <- readRDS("data/intermediate_seurat/cellchat_crc_computeCommunProb.rds") ``` ::: We can now filter results based first on the how many bins belong to each cell type: ```{r} #| label: filtercommunication # Removes celltype pairs with too few bins cellchat_crc <- filterCommunication(cellchat_crc, min.cells = 5) ``` Each ligand-receptor is associated with a pathway, which comes from the database that was curated from the scientific literature. So to summarize the results, we can aggregate the values to create a pathway-level score to identify more general trends in the data. ```{r} #| label: summarize_pathways # Summarize Ligand-Receptor pair probabilities into signalling pathway probabilities cellchat_crc <- computeCommunProbPathway(cellchat_crc) # Count interactions and sum weights into a flat network matrix cellchat_crc <- aggregateNet(cellchat_crc) # Compute how much each celltype sends vs receives across all pathways cellchat_crc <- netAnalysis_computeCentrality(cellchat_crc, slot.name = "netP") ``` ```{r} #| label: instructor_cellchat_crc #| echo: false # qs2::qs_save(cellchat_crc, "intermediate/15_cellchat_crc.qs") cellchat_crc <- qs2::qs_read("intermediate/15_cellchat_crc.qs") ``` ## Visualize CellChat results With this information, we can begin visualizing how the bins in `P5CRC` are interacting with one another. The CellChat package comes with a variety of different built-in ways to visualize the results which we will now explore. ### Strength and number of interactions The most popular representation is the `netVisual_circle()` plot, which shows how the celltypes are interacting with one another. This is accomplished by setting nodes of each celltype on the boundary of a circle, with lines from source to target showcasing either the number or strength of interactions. ```{r} #| label: fig-cell_type_interactions #| fig-cap: "Circular plot, showcasing the strength and number of ligand-receptor interactions between cell types." #| fig-width: 10 # 2 plots, side-by-side par(mfrow = c(1, 2), xpd = TRUE) # Number of LR interactions between celltypes netVisual_circle( cellchat_crc@net$count, weight.scale = TRUE, label.edge = FALSE, edge.width.max = 12, title.name = "Number of interactions") # Strength of LR interactions netVisual_circle( cellchat_crc@net$weight, weight.scale = TRUE, label.edge = FALSE, edge.width.max = 12, title.name = "Interaction strength") # Reset par to not affect later plots par(mfrow = c(1, 1)) ``` Notice that the strongest interactions are circular between tumor cells. This tells us that tumor cells are frequently and strongly sending signals to one another. We can also see that fibroblasts appear to be sending signals to tumor cells. [Rodriguez et al., 2025](https://www.nature.com/articles/s42003-025-08799-x), found that cancer-associate fibroblasts "induce pro-invasive gene expression changes involved in epithelial-mesenchymal transition, extracellular matrix remodeling and migration." ::: {.callout-note collapse="true"} # Zoom in on tumor and fibroblast bins We can also zoom in on the tumor and fibroblast bins to identify where they exist spatially. ```{r} #| label: fig-fibtro_tumor_tissue # Color bins by tissue after subsetting to tumor and fibroblasts SpatialDimPlot(crc %>% subset(first_type %in% c("Tumor", "Fibroblast")), group.by = "first_type", pt.size.factor = 13, image.alpha = 0.2) ``` As we might hope to see, there are fibroblasts that are located in close proximity to tumor bins. ::: An alternative way to represent the circular plots can be done with a heatmap. On the y-axis are the "sources" and the x-axis are the "receivers". The intensity of the heatmap is proportional to the number/strength of interactions between the source and receiver cell types. The top colored bar plot represents the sum of column of values displayed in the heatmap. The right colored bar plot represents the sum of row of values. ```{r} #| label: fig-netVisual_heatmap #| fig-cap: "Heatmap of interaction numbers and strengths between cell types." # Number of LR interactions between celltypes heatmap gg_h_count <- netVisual_heatmap( cellchat_crc, measure = "count", color.heatmap = "Blues", title.name = "Number of interactions") # Stregnth of LR interactions between celltypes heatmap gg_h_weight <- netVisual_heatmap( cellchat_crc, measure = "weight", color.heatmap = "Reds", title.name = "Interaction strength") # Plot heatmaps side by side gg_h_count + gg_h_weight ``` To summarize the amount of incoming and outgoing signal, we can also generate a scatterplot. Each point represents a unique cell types, where the size is the total number of incoming and outgoing ligand-receptor interactions for that celltype. The x-axis represents the total _outgoing_ communication probability and the y-axis, similarly, represents the total _incoming_ communication probability. ```{r} #| label: fig-signalingrole #| fig-cap: "Scatterplot of incoming and outgoing signaling for each cell type." # Scatterplot of in/out signaling for each cell type netAnalysis_signalingRole_scatter(cellchat_crc) ``` ### Pathway-level visuals The last steps we ran in the CellChat workflow was to calculate statistics for each pathway, which is comprised of various ligand-receptors. So now we can take a look at the top significant pathways associated within our dataset: ```{r} #| label: fig-ranknet #| fig-cap: "Top ranked ligand-receptor pathways." #| fig-height: 8 # Top significant pathways rankNet(cellchat_crc, measure = "weight", mode = "single", stacked = TRUE, do.stat = TRUE) + xlab("Pathways") ``` For each of these pathways, we have associated signaling strength for each celltype. We can evaluate the outgoing and incoming pathways signal strength for each celltype. ::: callout-warning # Plotting errors and warning > "Error: not enough space for cells at track index '1'." This error comes from when the size of the plot is too small to display the plot. Hit the "Zoom" button on "Plots" tab of RStudio and enlarge the image to view the image. ::: ```{r} #| label: fig-heatmap_pathways #| fig-cap: "Outgoing and incoming pathways signal strength for each celltype" #| fig-width: 10 #| fig-height: 10 # Heatmap of outgoing pathway signaling strength ht_out <- netAnalysis_signalingRole_heatmap( cellchat_crc, pattern = "outgoing", width = 10, height = 20, color.heatmap = "YlOrRd", title = "Outgoing signaling strength") # Heatmap of incoming pathway signaling strength ht_in <- netAnalysis_signalingRole_heatmap( cellchat_crc, pattern = "incoming", width = 10, height = 15, color.heatmap = "GnBu", title = "Incoming signaling strength") # Plot heatmaps side by side ht_out + ht_in ``` If we wanted to zoom in on one particular pathway, in this case `CEACAM`, we can use the `netVisual_aggregate()` and specify the signaling pathway. For a circular plot with lines directed between, we supply the `layout` argument as `"chord"`. Similar to the first visual we made, this is a circular plot that represents how much interaction is occuring between cell types **for a single pathway**. ```{r} #| label: fig-netVisual_aggregate_chord #| fig-cap: "CEACAM pathways interactions between celltypes." #| fig-width: 6.5 #| fig-height: 6.5 # netVisual_aggregate on the whole pathway to show chrods netVisual_aggregate(cellchat_crc, signaling = "CEACAM", layout = "chord", title.name = "CEACAM signaling") ``` The [CEACAM genes/pathway](https://pmc.ncbi.nlm.nih.gov/articles/PMC7127089/) is notably increased in epithelial tumor cells in colon cancer. These results we are seeing is consistent with that assumption, where we see many interactions within the CEACAM pathway to be with tumor bins. ::: {.callout-note collapse="true"} # Spatial `netVisual_aggregate()` There is `spatial` layout that we can also supply to `netVisual_aggregate()`. However, since our dataset does not have cell types localize to particular regions of the tissue, this visual is not as informative. ```{r} #| label: fig-netVisual_aggregate_spatial #| fig-cap: "Spatial plot of CEACAM pathway interaction between celltypes on the tissue." # netVisual_aggregate on the whole pathway to show arrows of interactions # Plotted on the spatial tissue slide netVisual_aggregate( cellchat_crc, signaling = "CEACAM", layout = "spatial", point.size = 0.05, alpha.image = 0.1, alpha = 0.6, vertex.size.max = 3, edge.width.max = 5, vertex.label.cex = 3) ``` Other, more structured spatially datasets, can make use of this type of figure to show where exactly interactions are occuring, seen here: ::: {#fig-cellchat_netVisual_aggregate_spatial .fig} ![](../img/cellchat_netVisual_aggregate_spatial.png){width=800} Spatial images of PSAP and SPP1 signaling pathway network, the thickness of lines represents the strength of signals.<br> _Image source: [Dong et al., 2025](https://www.nature.com/articles/s41698-025-01028-y/figures/7)_ ::: ::: ### Bubble plot Within CellChat, we can also zoom in on specific ligand-receptor pairs that belong to pathways we have identified. For example, let us take the top 5 pathways and use them as input for `netVisual_bubble()`: ```{r} #| label: top_pathways # Top 5 significant pathways top_pathways <- cellchat_crc@netP$pathways[1:5] top_pathways ``` In particular, we are interested in the celltypes that are interacting with tumor cells. So we will look at all ligand-receptor complexes in those top pathways that have tumor cells as the target. This is represented as a bubbleplot, where the size of the dot is proportion to the p-value. If a dot does not appear for, this indicates that the interaction is not significant. The color represents the probability of interaction for the celltype-directed interaction, as labeled on the x-axis. ```{r} #| label: fig-netVisiual_bubble #| fig-cap: "Bubbleplot of significant pathways with tumor cells as source." #| fig-width: 8 #| fig-height: 8 # Number of unique cell types n_ct <- seurat_rctd@meta.data %>% pull(first_type) %>% unique() %>% length() netVisual_bubble(cellchat_crc, signaling = top_pathways, sort.by.source.priority = TRUE, sources.use = 1:n_ct, targets.use = "Tumor", remove.isolate = TRUE, color.heatmap = "viridis", angle.x = 90) ``` In our case, the dots all appear to be the same size. This tells us that the pathways we selected earlier, when a ligand-receptor is significant, it is found to have a very low p-value (very significant). ### Zoom in on specific ligand-receptor complexes ::: {.callout-note collapse="true"} # Accessing the table of ligand-receptor results If you wanted to access the full dataframe of results, we can use the `subsetCommuncation()` function: ```{r} #| label: lr_results #| eval: false df_lr <- subsetCommunication(cellchat_crc) %>% arrange(desc(prob)) df_lr %>% head(5) ``` ```{r} #| label: tbl-lr_results #| tbl-cap: "Dataframe of ligand-receptor results, sorted by probability." #| echo: false df_lr <- subsetCommunication(cellchat_crc) %>% arrange(desc(prob)) df_lr %>% head(5) %>% knitr::kable() ``` ::: From the previous visual, perhaps we were interested in the specific ligand, `COL3A1`, and receptor, `SDC4`, interaction between fibroblasts and tumors. We can start by taking a look at the `SpatialFeaturePlot()` to get a better sense of where these genes localize on our slide: ```{r} #| label: fig-spatialfeatureplot #| fig-cap: "SpatialFeaturePlot showing the expression of a ligand (COL3A1) and a receptor (SDC4) on our CRC slide." ligand <- "COL3A1" receptor <- "SDC4" SpatialFeaturePlot(crc, features = c(ligand, receptor), pt.size.factor = 15, image.alpha = 0.2) ``` There is a [population of fibroblasts](12_deconvolution.html#fig-spatial_first_type_split) in the tumor cells that appears to correspond with the bins with high `COL3A1` expression. Most likely, it is these nearby fibroblasts that are interacting with tumor cells. To confirm that there is higher expression of `COL3A1` in fibroblasts, we can use a `VlnPlot()`: ```{r} #| label: fig-vlnplot_ligand_receptor #| fig-cap: "Distribution of ligand-receptor pair, COL3A1-SDC4, which has significant communication between fibroblasts and tumor cells." #| fig-height: 6 VlnPlot(crc, features = c(ligand, receptor), group.by = "first_type", pt.size = 0, ncol = 1) ``` Notably, there is also high expression of `SDC4` in the tumor cells. This is the underlying principle of mass action utilized by CellChat. Essentially, if there is high expression of the ligand (`COL3A1`) in the source celltype (fibroblasts) and high expression of the receptor (`SDC4`) in the target celltype (tumor cells), then there will be a high probability communication between the two, especially, if they are localized to the same area of the tissue. ## Session information Now that we have completed our analysis, it would be a good time record the packages and versions of the tools we used in the analysis. The `sink()` function allows us to save the output from a function, in this case the `sessionInfo()` function. This improves the reproducibility of our code for ourselves and others. We can do this like: ```{r} #| label: sessioninfo #| eval: false # Create and save a text file with sessionInfo sink("sessionInfo_spatial_transcriptomics.txt") sessionInfo() sink() ``` *** [Back to Schedule](../schedule/schedule.qmd)