Space Ranger Processing

category_1
category_2
category_3
category_4

Write a description of the lesson here.

Author

Noor Sohail

Published

March 25, 2026

Keywords

keyword_1, keyword_2, keyword_3, keyword_4, keyword_5, keyword_6

Approximate time: XX minutes

Learning objectives

In this lesson, we will:

  • Understand the inputs and outputs of the Space Ranger pipeline
  • Outline the steps of the Space Ranger algorithm
  • Discuss segmentation in the context of spatial transcriptomics data

Overview of lesson

When doing XYZ…

Space Ranger overview

Space Ranger is a free software created by 10X Genomics to process Visium datasets. The output generated from Space Ranger can be used for downstream analysis in R/Python or using the proprietary Loupe browser from 10X Genomics.

Other processing software

Space Ranger is not the only software that can be used to process spatial transcriptomics data. For example, there are also open-source tools such as ST Pipeline and SAW.

As we will be working with Visium HD data for this workshop, we will be using Space Ranger outputs. That being said, the overarching steps and considerations for processing spatial transcriptomics data are similar across different platforms and software. So, even if you are not working with Visium HD data, the concepts covered in this lesson will still be relevant to you.

The starting point of this workshop is the files generated by Space Ranger, so here we will briefly go over the inputs, outputs, and algorithms used by Space Ranger. This will help you understand how the raw data is transformed into a count matrix and coordinates that can be used for downstream analysis.

Input

To process Visium HD data, you use the spaceranger count command to align the reads in the FASTQ files against a reference genome and provides their spatial location using the oligonucleotide barcode. The three primary inputs to the count command are:

  1. FASTQ files containing the raw sequencing reads
  2. A reference transcriptome (e.g. human or mouse reference provided by 10X Genomics, or a custom reference that you create yourself)
  3. Image (typically a .TIFF file) of the tissue section on the Visium slide

With these files, you would then run the spaceranger count command. Note that Space Ranger requires a Linux system with at least 32 cores, 64GB of RAM, and 1TB of disk space. Do not try to run this on your laptop!!!

Here we are showing an example script of how you would run spaceranger count. We also describe what each of the parameters are set, to help you understand how to run this command on your own data.

# Example script to run spaceranger count
# DO NOT RUN
cd /home/jdoe/runs
spaceranger count --id=sample345 \ #Output directory
                  --transcriptome=/home/jdoe/refdata/GRCh38-2020-A \ #Path to Reference
                  --fastqs=/home/jdoe/runs/HAWT7ADXX/outs/fastq_path \ #Path to FASTQs
                  --sample=mysample \ #Sample name from FASTQ filename
                  --image=/home/jdoe/runs/images/sample345.tiff \ #Path to brightfield image
                  --slide=V19J01-123 \ #Slide ID
                  --area=A1 \ #Capture area
                  --localcores=8 \ #Allowed cores in localmode
                  --localmem=64  #Allowed memory (GB) in localmode

Here we get a better idea of what the inputs to spaceranger count are:

Table 1: spaceranger count input parameters.
Argument Filetype(s) Description Example Command
cytaimage TIFF Corresponding CytAssist image. /path/to/CAVG10539_2023-11-16_14-56-24_APPS115_H1-YD7CDZK_A1_S11088.tif
image TIFF, QPTIFF, BTF, JPEG Brightfield microscope image. /path/to/APPS115_11088_rescan_01.btf
slide Text ID Slide identifier. Used with area (and optionally slidefile) to specify slide parameters when Space Ranger has or does not have internet access. H1-YD7CDZK
area Text ID Capture area identifier on the slide. Used with slide to specify slide parameters. A1
transcriptome Reference transcriptome directory Path to reference transcriptome. Human and mouse references can be downloaded, or a custom reference can be used. /path/to/refdata-gex-GRCh38-2020-A
probe-set CSV Probe set CSV file. Found in the probe_sets directory of the Space Ranger package or downloadable from 10x Genomics. For Visium HD 3’ data, this parameter should not be specified. /path/to/Visium_Human_Transcriptome_Probe_Set_v2.0_GRCh38-2020-A.csv
fastqs Directory containing FASTQ files Path to directory containing FASTQ files. The FASTQ files should be named according to the 10X Genomics format, which includes the sample name specified in the sample parameter. /path/to/fastq_path
sample Text ID Sample name. Should match the sample name in the FASTQ filenames. Used to identify which FASTQ files to use for the analysis. mysample
id Text ID Output directory name. The output files will be generated in a directory with this name. sample345
localcores Integer Number of cores to use when running in local mode. 8
localmem Integer (GB) Amount of memory (in GB) to use when running in local mode. 64

Once this command is run, it will take a few hours to generate the output files, which we will discuss in the next section.

Output

When spaceranger count completes successfully, it will generate a variety of outputs (seen below), which will enable the analyst to perform further analysis in R/Python or using the proprietary Loupe browser from 10X Genomics.

Figure 1: Example of the different outputs generated by spaceranger count.
Image source: 10X Genomics: Space Ranger Documentation

A good starting point is to take a look at the QC of the sample in the web summary, which we have provided in reports/ folder that you downloaded.

In the Visium HD assay, Space Ranger aggregates transcript counts into square spatial bins of different sizes, typically:

  • 2µm x 2µm
  • 8µm x 8µm
  • 16µm x 16µm
Figure 2: Depiction of the grid-like structure of a Visium HD slide, which can then be binned into various squares of different sizes.
Image source: 10x Visium HD Spatial Gene Expression Manual

Having access to 2μm bins, along with matched high-resolution tissue morphology, provides a great opportunity to reconstruct single cells from the data. However, because the 2µm x 2µm bins (and even the 8µm x 8µm bins) are very small, there is a potential for very little biological signal to be captured per bin. Additionally, the sheer number of bins at these higher resolutions can substantially increase computational demands in terms of memory usage and processing time.

Space Ranger algorithm

The Space Ranger algorithm is split into two main parts: read and image processing. Think of this as dealing with each of the two main data types generated in a Visium experiment: the sequencing data and the image data. At the end, everything is pulled together to generate the final output files that we can use for downstream analysis.

Figure 3: Schematic of the Space Ranger algorithm, including both read and image processing steps.
Image source: 10X Genomics: Visium HD Analysis with spaceranger count

Read processing

Quantifying the gene expression in a sample can be broken down into the following steps:

1. Read trimming

Remove the template switch oligo sequence from the 5’ end as well as the polyA tail from the 3’ end of the reads. This is to reduce mismatches during the alignment step to improve sensitivity and reduce computational time. The information about the trimmed bases is retained in the BAM file.

2. Alignment and barcode correction

Figure 4: Reads are classified based on whether they are exonic (light blue) or intronic (red) and whether they are sense or antisense (purple).
Image source: 10X Genomics: Cell Ranger’s Gene Expression Algorithm

Sequenced reads are aligned against the reference probe set or transcriptome using STAR. Any read that does not map confidently to a single gene with a MAPQ score of 255 is flagged as low quality.

Space Ranger uses a Hamming Distance to correct for sequencing errors in the barcode sequence. This is possible because every barcode is a known sequence that is at least 3 edits away from any other barcode. This allows the algorithm to retain more reads that would have been discarded due to sequencing errors.

This is the step where the BAM file is generated, where the gene, original barcode, and corrected barcode information is stored for each read.

3. Count UMIs

Similar to the barcode correction, Space Ranger also uses a Hamming Distance to correct for sequencing errors in the UMI sequence. Then, reads that have the same corrected barcode, gene, and UMI are collapsed into a single count to generate the count matrix. This is the step where we can begin tabulating the expression values for each barcode to generate a count matrix.

We can consider the following scenarios for “collapsing on UMIs”:

  • Reads with different UMIs mapping to the same transcript were derived from different molecules and are biological duplicates - each read should be counted.
  • Reads with the same UMI originated from the same molecule and are technical duplicates - the UMIs should be collapsed to be counted as a single read.
  • In image below, the reads for ACTB should be collapsed and counted as a single read, while the reads for ARL1 should each be counted.
Warning

TODO: Need a new schematic that says bins instead of cells

Figure 5: Example of how UMIs are collapsed to generate the count matrix.
Image source: Macosko et al. (2015)

4. Binning

Each barcode is then binned together with other barcodes that fall within the same spatial spot. By default, Visium HD datasets are binned into 2µm x 2µm, 8µm x 8µm, and 16µm x 16µm bins. The bin sizes will affect the amount of biological signal captured per bin. For example, an 8 µm bin provides a 16-fold increase compared to a 2µm square.

An important consideration at this step is the concept of Modifiable Areal Unit Problem (MAUP), which is a source of bias that can occur when spatial data is aggregated (e.g. bins).

5. Bins under the tissue

Not every square on the slide will have tissue on it. In this step, bins that are “under the tissue” are identified and then the count matrix is generated to only include those bins.

Warning

TODO: Need a new schematic that says bins instead of cells

Figure 6: Example of the count matrix generated by Space Ranger, where the rows are the genes and the columns are the barcodes (or bins).
Image source: Lafzi, et al. (2018)

Beyond this point, there is also “secondary analyses” steps that are generated by default. However, we are going to load our dataset into R to perform our own QC and analyses. This is in part because the default Space Ranger analyses are generic and not specific to the tissue or biological question you are asking.

Image processing

The image processing steps of the Space Ranger algorithm are as follows:

1. Fiducial alignment

Warning

TODO: https://www.10xgenomics.com/support/software/space-ranger/latest/algorithms-overview/image-processing

2. Tilt correction and tissue detection

If the CytAssist camera is tilted, there is a built-in step to correct for this tilt. This is to ensure that the coordinates map accurately to the transcriptional data. Then, estimates are calculated on if tissue exists at each spot. This is then used to classify if a pixel is considered tissue or background.

3. Image registration and UMI refinement

As the CytAssist machine generates multiple images of the same tissue, it is crucial that there is consistency between each of the coordinates. A feature matching algorithm was created to match similar features across the different images to ensure consistency.

Similarly, the H&E stain is contrasted against the UMI counts as a quality control step to ensure that slides are oriented correctly in the 2µm resolution.

4. Segmentation

Space Ranger uses a customized version of the StarDist algorithm to perform segmentation on the H&E image to identify the boundaries of each cell. This method is a deep learning model that is able to identify boundaries of star-convex shapes (similar to nuclei). The result is a series of polygons that overlap the tissue that are meant to correspond with cell boundaries.

Figure 7: Mock example of how stains are used to identify cell boundaries with segmentation algorithms.
Image source: 10X Genomics: Nucleus and Cell Segmentation Algorithms
Segmentation in imaging-based spatial transcriptomics

Segmentation is not a key step when working with sequencing-based spatial transcriptomics data, such as Visium HD. This is because the data is already binned into small squares, so there is no need to segment individual cells.

However, segmentation is a critical step when working with imaging-based spatial transcriptomics data, such as MERFISH or Xenium. This is because we need to identify the boundaries of each cell to assigned transcript counts to that cell. These boundaries are typically identified with stains for the nuclei and cell walls in conjunction with expression data.

Segmentatin is a challenging problem, especially when cells have irregular shapes (e.g., neuron cells). Some methods for segmentation that have been published include, but are not limited to:


Next Lesson >>

Back to Schedule

Reuse

CC-BY-4.0