scrna_metabolic

Metabolic landscape analysis for single-cell RNA-seq data

An abstract from https://github.com/LocasaleLab/Single-Cell-Metabolic-Landscape

Reference

Xiao, Zhengtao, Ziwei Dai, and Jason W. Locasale. "Metabolic landscape of the tumor microenvironment at single cell resolution." Nature communications 10.1 (2019): 1-12.

Run the pipeline

Run from CLI:

pipen run scrna_metabolic metabolic_landscape [options]

Serve as part of a pipeline:

from biopipen.namespace.scrna_metabolic import build_processes

MetabolicInputs = build_processes(<options>)

# MetabolicInputs is a process
# You can specify it's dependents so that the whole metabolic landscape pipeline
# works as a part of another pipeline

Inputs

  • metafile: A metafile indicating the metadata or the rds file with seruat object if config.pipeline.scrna_metabolic.clustered is True. For a meta file, Two columns are required: Sample and RNAData Sample should be the first column with unique identifiers for the samples and RNAData indicates where the expression matrices are.

    Currently only 10X data is supported

  • gmtfile: The GMT file with the metabolic pathways

  • config: The configuration file, string in TOML format or a python dictionary as config for the analysis (based on envs.config_fmt) They keys include:

    • grouping: How do we group the cells groupby - The column used to group by if it exists mutaters - Add new columns to the metadata to group by They are passed to sobj@meta.data |> mutate(...)
    • subsetting: How do we subset the data. The imputation will be done in each subset separately groupby - The column used to subset if it exists alias - The alias of the subset working as a prefix to subset names mutaters - Add new columns to the metadata to subset by
    • design: What kind of comparisons are we doing? It should be the values of subsetting groupbys

A step-by-step example

Prepare the seurat object

Using the data from:

Yost KE, Satpathy AT, Wells DK, Qi Y et al. Clonal replacement of tumor-specific T cells following PD-1 blockade. Nat Med 2019 Aug;25(8):1251-1259. PMID: 31359002

library(Seurat)

# Download data (tcell rds and metadata) from
# https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE123813

count_file <- "test_data/scrna_metabolic/GSE123813_bcc_scRNA_counts.txt.gz"
meta_file <- "test_data/scrna_metabolic/GSE123813_bcc_all_metadata.txt.gz"

counts <- read.table(count_file, header = TRUE, row.names = 1, sep = "\t", check.names = F)
metadata <- read.table(meta_file, header = TRUE, row.names = 1, sep = "\t", check.names = F)
# Subset 1000 cells for jusb demo purpose
counts = counts[, sample(1:ncol(counts), 1000)]
metadata = metadata[colnames(counts),]
# Create seurat object
seurat_obj = CreateSeuratObject(counts=counts)
seurat_obj@meta.data = cbind(
    seurat_obj@meta.data,
    metadata[rownames(seurat_obj@meta.data),]
)
seurat_obj = NormalizeData(seurat_obj)
all.genes <- rownames(seurat_obj)
seurat_obj <- ScaleData(seurat_obj, features = all.genes)
seurat_obj <- FindVariableFeatures(object = seurat_obj)
seurat_obj <- RunPCA(seurat_obj, features = VariableFeatures(object = seurat_obj))

# Output exceeds the size limit. Open the full output data in a text editor

# Warning message:
# "Feature names cannot have underscores ('_'), replacing with dashes ('-')"
# Centering and scaling data matrix

# PC_ 1
# Positive:  CALD1, EMP2, SPARC, CAV1, EMP1, ACTN1, KRT17, BGN, DSP, RAB13
# 	   RND3, S100A16, KRT14, S100A14, S100A2, CD9, FHL2, IER3, GJB2, TRIM29
# 	   TSC22D1, KRT15, SFN, FSTL1, DDR1, IGFBP7, JUP, PTRF, KRT5, FOSL1
# Negative:  CD74, CRIP1, RARRES3, RGS1, LAT, CD69, NKG7, HLA-DPA1, TNFRSF4, CD27
# 	   GZMA, HLA-DRB1, CXCR6, CTSW, GPR183, LDLRAD4, ICOS, HIST1H4C, GZMK, AC092580.4
# 	   SLC9A3R1, CCR7, S100A4, HLA-DPB1, HMGB2, GBP5, SELL, CXCR3, LAG3, HLA-DRB5
# PC_ 2
# Positive:  DSP, DSC3, SFN, SERPINB5, KRT17, S100A14, KRT15, KRT16, IRF6, TRIM29
# 	   KRT14, LGALS7B, JUP, PERP, TACSTD2, GJB2, KRT5, KRT6B, DDR1, S100A2
# 	   PKP1, CDH3, KRT6A, FXYD3, GJB3, MPZL2, CXADR, DSC2, DSG3, GRHL3
# Negative:  COL1A2, COL3A1, LUM, COL6A1, MXRA8, FN1, CTSK, COL1A1, COL6A3, DCN
# 	   MMP2, COL6A2, PRRX1, FKBP10, TNFAIP6, FAP, PCOLCE, PDGFRB, NNMT, AEBP1
# 	   C1S, CCDC80, SFRP2, RCN3, PDPN, SERPINF1, COL5A2, CTHRC1, COL5A1, COL12A1
# PC_ 3
# Positive:  LYZ, TYROBP, SPI1, FCER1G, KYNU, C15orf48, BCL2A1, HLA-DRA, CD68, AIF1
# 	   CST3, CTSZ, SERPINA1, CSF2RA, HLA-DRB5, SLC7A11, LST1, MS4A7, ALDH2, FAM49A
# 	   IFI30, HLA-DRB1, GPR157, PLEK, HLA-DPB1, IL1B, CD86, CCDC88A, HLA-DPA1, RNF144B
# Negative:  MT2A, MT1X, BGN, MT1E, COL6A2, COL1A2, COL6A1, COL5A2, MXRA8, DCN
# 	   COL12A1, COL3A1, COL1A1, LUM, MFAP2, PCOLCE, MT1F, MMP2, COL5A1, COL6A3
# 	   PRRX1, C1S, AEBP1, CTSK, FAP, LAT, PDGFRB, RARRES3, THY1, EFEMP2
# 	   GJB3, LYPD3, GRHL3, PVRL4, KRT16, TACSTD2, GJB5, SERPINB13, MPZL2, KRT23
# Negative:  UBE2C, GTSE1, BIRC5, RRM2, CCNA2, TYMS, TOP2A, TK1, DLGAP5, MKI67
# 	   PKMYT1, CENPA, KIFC1, CDCA5, UHRF1, ASF1B, AURKB, FAM111B, TROAP, CKAP2L
# 	   HJURP, ESCO2, FOXM1, CDK1, ZWINT, E2F2, CLSPN, HIST1H1B, CDT1, MCM10
# By default, the pipeline assumes the cells are not clustered
# and it will do the clustering using the scrna.SeuratClustering process
#
# If you want to do the clustering yourself, remember to add following
#
# [pipeline.scrna_metabolic]
# clustered = false
#
# to your `.biopipen.toml` either in your home directory or current directory.

seurat_obj <- FindNeighbors(seurat_obj, dims = 1:10)
seurat_obj <- FindClusters(seurat_obj, resolution = 0.5)
head(Idents(seurat_obj))
# Computing nearest neighbor graph

# Computing SNN

# Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

# Number of nodes: 1000
# Number of edges: 30631

# Running Louvain algorithm...
# Maximum modularity in 10 random starts: 0.8795
# Number of communities: 9
# Elapsed time: 0 seconds

# bcc.su009.post.tcell_CCATTCGCAATCTACG 0
# bcc.su004.pre.tcell_ACAGCTACACTTCTGC  0
# bcc.su006.pre.tcell_CGTGTAAAGTGACTCT  1
# bcc.su001.post.tcell_CCTTTCTGTACCGTTA 2
# bcc.su007.pre.tcell_ATTTCTGAGAAGGACA  1
# bcc.su009.pre.tcell_AGACGTTTCCTGCAGG8 8
# Levels:
# '0''1''2''3''4''5''6''7''8'
# Check the meta.data
head(seurat_obj@meta.data)
# 	orig.ident	nCount_RNA	nFeature_RNA	patient	treatment	sort	cluster	UMAP1	UMAP2	RNA_snn_res.0.5	seurat_clusters
# <fct>	<dbl>	<int>	<chr>	<chr>	<chr>	<chr>	<dbl>	<dbl>	<fct>	<fct>
# bcc.su009.post.tcell_CCATTCGCAATCTACG	bcc.su009.post.tcell	4242	1726	su009	post	CD45+ CD3+	CD8_mem_T_cells	-9.1816435	0.7484789	0	0
# bcc.su004.pre.tcell_ACAGCTACACTTCTGC	bcc.su004.pre.tcell	3159	1466	su004	pre	CD45+ CD3+	CD8_ex_T_cells	4.1067562	3.3754938	0	0
# bcc.su006.pre.tcell_CGTGTAAAGTGACTCT	bcc.su006.pre.tcell	3279	1401	su006	pre	CD45+ CD3+	Tregs	0.4816457	11.9388428	1	1
# bcc.su001.post.tcell_CCTTTCTGTACCGTTA	bcc.su001.post.tcell	5057	2218	su001	post	CD45+ CD3+	CD8_ex_T_cells	2.3045983	7.4856248	2	2
# bcc.su007.pre.tcell_ATTTCTGAGAAGGACA	bcc.su007.pre.tcell	3701	1413	su007	pre	CD45+ CD3+	CD8_mem_T_cells	-1.9617293	5.5546365	1	1
# bcc.su009.pre.tcell_AGACGTTTCCTGCAGG	bcc.su009.pre.tcell	3891	1593	su009	pre	CD45+ CD3+	Tcell_prolif	5.2243228	-1.0460106	8	8
# save seurat object
saveRDS(seurat_obj, "test_data/scrna_metabolic/seurat_obj.rds")

Prepare the pathway file

A set of collected metabolic pathways can be found here:

https://github.com/LocasaleLab/Single-Cell-Metabolic-Landscape/blob/master/Data/KEGG_metabolism.gmt

Download and save it to test_data/scrna_metabolic/KEGG_metabolism.gmt

Prepare the configuration file

Save at test_data/scrna_metabolic/config.toml:

# Set input data
[MetabolicInputs.in]
metafile = ["test_data/scrna_metabolic/seurat_obj.rds"]
gmtfile = ["test_data/scrna_metabolic/KEGG_metabolism.gmt"]

[[MetabolicInputs.in.config]]
# optional, used to identify the subset objects from the same case
name = "case1"

[MetabolicInputs.in.config.grouping]
groupby = "Idents"

[MetabolicInputs.in.config.subsetting]
alias = "Timepoint"
groupby = "timepoint"

[MetabolicInputs.in.config.subsetting.mutaters]
timepoint = "if_else(patient != 'su001', NA_character_, treatment)"

[MetabolicInputs.in.config.design]
# we also want to do some intra-subset comparisons
post_vs_pre = ["post", "pre"]

Run the pipeline

# Make sure `pipeline.scrna_metabolic.clustered` is `false` in
# `./.biopipen.toml` or `~/.biopipen.toml`
pipen run scrna_metabolic metabolic_landscape --config test_data/scrna_metabolic/config.toml

Check out the results/reports

The results can be found at ./metabolic-landscape_results/, and reports can be found at ./metabolic-landscape_results/REPORTS. To check out the reports, open ./metabolic-landscape_results/REPORTS/index.html in your browser.

There are 4 parts of the results:

  • MetabolicPathwayActivity:

    The pathway activities for groups (defined by grouping in the configration) for each subset (defined by subsetting)

  • MetabolicPathwayHeterogeneity:

    Pathway heterogeneity for groups for each subset

  • MetabolicFeatures:

    The pathway enrichment analysis in detail for each group against the rest of the groups in each subset (inter-subset).

  • MetabolicFeaturesIntraSubsets:

    The pathway enrichment analysis in detail by the designs (defined by design in the configration) for each group. The designs are basically comparisons between subsets, and that'ss why this is called intra-subsets