Michelli F. Oliveira†, Juan P. Romero†, Meii Chung†, Stephen Williams, Andrew D. Gottscho, Anushka Gupta, Sue Pilipauskas, Syrus Mohabbat, Nandhini Raman, David Sukovich, David Patterson, Visium HD Development Team, Sarah E. B. Taylor‡
† These authors contributed equally to this work
‡ Corresponding author
A comprehensive understanding of cellular behaviour and response to the tumor microenvironment (TME) in colorectal cancer (CRC) remains elusive. Here, we introduce the high definition Visium spatial transcriptomics technology (Visium HD) and investigate formalin fixed paraffin embedded (FFPE) human CRC samples. We demonstrate the high sensitivity, single cell-scale resolution, and spatial accuracy of Visium HD, generating a highly refined whole transcriptome spatial profile of CRC samples. We identify transcriptomically distinct macrophage subpopulations in different spatial niches with potential which exert pro- and anti-tumor functions via interactions with tumor and T cells. In situ gene expression analysis validates our findings and localizes a clonally expanded T cell population close to macrophages with anti-tumor features. Our study demonstrates the power of high-resolution spatial technologies to understand cellular interactions in the TME, and paves the way for larger studies that will unravel mechanisms and biomarkers of CRC biology, improving diagnosis and disease management strategies.
The full dataset used in this repository and in the manuscript can be downloaded from the following link Dataset Raw data has also been deposited at GEO under accesion number GSE280318
This repository contains the scripts to replicate the findings displayed in the manuscript. It is organized into two folders Figures and Methods. The Figures folder has the scripts to replicate the figures in the manuscript and the files are named accordingly. The methods folder contains the different custom methods developed for the manuscript.
The Figures files require specific outputs generated with the Methods scripts.
In this section with provide a description and a start guide for the different methods used and developed for the manuscript.
R script with multiple custom R functions used in the manuscript. To load all the functions, we use the source function:
source("~/HumanColonCancer_VisiumHD/Methods/AuxFunctions.R")
R script used to process the FLEX single cell data. It takes the outputs from cellranger aggr.
Given the dataset's large size, we adopted the sketch-based analysis approach in Seurat1 v5 sketched-based analysis, sampling 15% of the entire dataset (~37,000 cells) for downstream analysis. After completing the analysis on the subsampled data, we extended it to the entire single cell dataset.
The script saves the full processed Seurat object and the Metadata for plotting purposes in the Figure scripts.
saveRDS(ColonCancer_Flex,file='~/Outputs/Flex/FlexSeuratV5.rds') # Full Seurat Object
saveRDS(ColonCancer_Flex@meta.data,file='~/Outputs/Flex/FlexSeuratV5_MetaData.rds') #Meta Data
R script used to run spaceXR2 for deconvolution. It requires the UMI count matrix from cellranger aggr and the MetaData generated with the FlexSingleCell.R
script to generate the reference. For Visium HD the Space Ranger outs are also required.
Due to the number of barcodes in Visium HD, we modified the source code of spaceXR to improve runtime. The modified version can be found in the following Pull Request. However, the original version can also be used to deconvolve the Visium HD data.
In the script we use sample P1CRC as a template to run the algorithm, but it can also be used for any other sample.
Python script used to run nuclei segmentation on H&E images used for the tissue sections processed with Visium HD. To create the the conda environemnt please see yml section
The script takes an HE image as an imput and performs nuclei segmentation on the full section using the stardist3 package. The user can provide a set of coordinates to generate a crop of the image along with the corresponding masks located within that region. The user also provides the path to the outputs directory for a given bin size (i.e. 2µm) and will output a .csv file that assigns all the barcodes located within the all segmentation masks.
The script can be called as follows:
python ./HumanColonCancer_VisiumHD/Methods/NucleiSegmentation.py -i ./PATH_TO_HE_image -r1 rowmin -r2 rowmax -c1 colmin -c2 colmax -s ./PATH_TO_SR_outs/binned_outputs/square_002um/ -o Output_directory
More details on the required inputs:
parser.add_argument('-i','--image', type=str, help='Path to HE image')
parser.add_argument('-r1','--rmin', type=int, help='row min for zoom in')
parser.add_argument('-r2','--rmax', type=int, help='row max for zoom in')
parser.add_argument('-c1','--cmin', type=int, help='column min for zoom in')
parser.add_argument('-c2','--cmax', type=int, help='column max for zoom in')
parser.add_argument('-s','--srdir', type=str, help='Path to spaceranger outs at a given bin size')
parser.add_argument('-o',"--out", type=str,help="Directory where to save outputs")
Outputs:
Nuclei_Barcode_Map.csv
csv file with the Nuclei and barcode relationship for the full sectionlabels_FullSection.pckl
Labels of the identified nuclei for the full sectionpolys_FullSection.pckl
Coordinates of the identified polygons for the full section.img_rois_Stardist_Subset.zip
if coordinates are given, segmented nuclei for the selected region. Can be visualized with QuPath4img_Stardist.tif
if coordinates are given, tif file with the selected zoom in region. Can be visualized with QuPath4
yml
file to create a conda environment with all the required dependencies for the NucleiSegmentation.py
script.
To create the evironment using the provided file:
conda env create --name NucleiSeg --file=./HumanColonCancer_VisiumHD/Methods/environment_nucleisegmentation.yml
To activate the environment:
conda activate NucleiSeg
The MetaData folder contains files with the associated metadata used in the manuscript.
The SingleCell_MetaData.csv.gz
contains the following columns:
- Barcode : cell barcode
- Patient : Patient of origin
- BC : Probe barcode to identify sample of origin
- QCFilter : Binary column denoting if a cell was kept or removed during QC
- Level1 : Level 1 cell type annotation
- Level2 : Level 2 cell type annotation
- UMAP1 : UMAP dimension 1 coordinates
- UMAP2 : UMAP dimension 2 coordinates
The parquet files (i.e P1CRC_Metadata.parquet
) can be opened in R using the following code:
library(arrow)
Data<-read_parquet("~/HumanColonCancer_VisiumHD/MetaData/P1CRC_Metadata.parquet")
These parquet files contain the following columns:
- barcode : 8um bin barcode
- tissue : Binary column denoting if the bin is under tissue or not
- X : Spatial X coordinate
- Y : Spatial Y coordinate
- DeconvolutionClass : Deconvolution class for the bin (singlet, doublet, doublet_certain,doublet_uncertain or reject)
- DeconvolutionLabel1 : Gives the first cell type predicted on the bin
- DeconvolutionLabel2 : Gives the second cell type predicted on the bin (Not valid for reject or doublet_uncertain)
- Periphery : Indicates if the bin is in the 50 micron tumor periphery, in the tumor or rest of the tissue
- UnsupervisedL1 : Merged unsupervised clusering annotation (Level 1)
- UnsupervisedL2 : Merged unsupervised clusering annotation (Level 2)
- MacrophageSubtype : Subtype of macrophage (SELENOP+ or SPP1+) in the tumor periphery
- GobletSubcluster : Goblet subcluster used in Figure 5
The Figures folder contains all the scripts to create the figures used in the manuscript. Most of the scripts within this folder require outputs generated from the Methods section.
The required R packages are common across the files:
library(Seurat)
library(scattermore)
library(tidyverse)
library(data.table)
library(wesanderson)
library(patchwork)
library(RColorBrewer)
library(furrr)
library(paletteer)
library(arrow)
library(pheatmap)
library(RColorBrewer)
library(distances)
library(rhdf5)
library(glue)
library(Matrix)
library(ggpubr)
library(ggeasy)
library(arrow)
The beginning of each file starts with a data.frame that can be used as a template to generate the output for the different sections. We use P1CRC as a template, but can be replaced with any other section.
SampleData<-data.frame(Patient = "PatientCRC1", # Name of the Sample
PathSR="~/VisiumHD/PatientCRC1/outs/", # Path to space ranger outs folder
PathDeconvolution="~/Outputs/Deconvolution/PatientCRC1_Deconvolution_HD.rds") #spaceXR Deconvolution results
-
Hao, Yuhan, et al. "Dictionary learning for integrative, multimodal and scalable single-cell analysis." Nature biotechnology 42.2 (2024): 293-304. Paper
-
Cable, Dylan M., et al. "Robust decomposition of cell type mixtures in spatial transcriptomics." Nature biotechnology 40.4 (2022): 517-526. Paper
-
Weigert, Martin, and Uwe Schmidt. "Nuclei instance segmentation and classification in histopathology images with stardist." 2022 IEEE International Symposium on Biomedical Imaging Challenges (ISBIC). IEEE, 2022. Paper
-
Bankhead, Peter, et al. "QuPath: Open source software for digital pathology image analysis." Scientific reports 7.1 (2017): 1-7. Paper
Here we list the git tag and link to the preprint initially submitted to bioRxiv