scrna_metabolic
Metabolic landscape analysis for single-cell RNA-seq data
An abstract from https://github.com/LocasaleLab/Single-Cell-Metabolic-Landscape
Reference
Xiao, Zhengtao, Ziwei Dai, and Jason W. Locasale. "Metabolic landscape of the tumor microenvironment at single cell resolution." Nature communications 10.1 (2019): 1-12.
Run the pipeline
Run from CLI:
pipen run scrna_metabolic_landscape ScrnaMetabolicLandscape [options]
Serve as part of a pipeline:
from biopipen.ns.scrna_metabolic_landscape import ScrnaMetabolicLandscape
pipeline = ScrnaMetabolicLandscape(<options>)
# You can specify dependencies so that the whole metabolic landscape pipeline
# works as a part of another pipeline
Inputs
-
metafile
: Either a metafile or an rds file of a Seurat object. If it is a metafile, it should have two columns:Sample
andRNAData
.Sample
should be the first column with unique identifiers for the samples andRNAData
indicates where the barcodes, genes, expression matrices are. The data will be loaded and an unsupervised clustering will be done. Currently only 10X data is supported. If it is an rds file, the seurat object will be used directly. -
gmtfile
: The GMT file with the metabolic pathways. The gene names should match the gene names in the gene list in RNAData or the Seurat object. You can also provide a URL to the GMT file. -
group_by
: Group the data by the given column in the metadata. For example,cluster
. -
subset_by
: (Optional) Subset the data by the given column in the metadata. For example,Response
.NA
values will be removed in this column. If None, the data will not be subsetted. -
mutaters
: (Optional) Add new columns to the metadata for grouping/subsetting. They are passed tosobj@meta.data |> mutate(...)
. For example,{"timepoint": "if_else(treatment == 'control', 'pre', 'post')"}
will add a new columntimepoint
to the metadata with values ofpre
andpost
based on thetreatment
column. -
cases
: (Optional) Multiple cases for the analysis. If you have multiple different grouping/subsetting scenarios, you can specify them here. Each case can have its ownsubset_by
,group_by
, and analysis parameters.
Advanced Configuration
Multiple Cases
You can define multiple analysis cases with different grouping/subsetting strategies:
[ScrnaMetabolicLandscape]
metafile = "test_data/scrna_metabolic/seurat_obj.rds"
gmtfile = "test_data/scrna_metabolic/KEGG_metabolism.gmt"
[ScrnaMetabolicLandscape.MetabolicPathwayActivity.envs.cases]
"By Treatment" = { group_by = "seurat_clusters", subset_by = "treatment" }
"By Response" = { group_by = "seurat_clusters", subset_by = "response" }
Custom Plotting
Each process supports customizable plots:
[ScrnaMetabolicLandscape.MetabolicPathwayActivity.envs.plots]
"Custom Heatmap" = {
plot_type = "heatmap",
show_row_names = true,
devpars = { width = 1200, height = 800, res = 150 }
}
"Custom Violin" = {
plot_type = "violin",
add_box = true,
devpars = { width = 1000, height = 600 }
}
[ScrnaMetabolicLandscape.MetabolicFeatures.envs.plots]
"Top 5 Summary" = {
plot_type = "summary",
top_term = 5,
level = "subset"
}
A step-by-step example
Prepare the seurat object
Using the data from:
Yost KE, Satpathy AT, Wells DK, Qi Y et al. Clonal replacement of tumor-specific T cells following PD-1 blockade. Nat Med 2019 Aug;25(8):1251-1259. PMID: 31359002
library(Seurat)
# Download data (tcell rds and metadata) from
# https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE123813
count_file <- "test_data/scrna_metabolic/GSE123813_bcc_scRNA_counts.txt.gz"
meta_file <- "test_data/scrna_metabolic/GSE123813_bcc_all_metadata.txt.gz"
counts <- read.table(count_file, header = TRUE, row.names = 1, sep = "\t", check.names = F)
metadata <- read.table(meta_file, header = TRUE, row.names = 1, sep = "\t", check.names = F)
# Subset 1000 cells for just demo purpose
counts = counts[, sample(1:ncol(counts), 1000)]
metadata = metadata[colnames(counts),]
# Create seurat object
seurat_obj = CreateSeuratObject(counts=counts)
seurat_obj@meta.data = cbind(
seurat_obj@meta.data,
metadata[rownames(seurat_obj@meta.data),]
)
seurat_obj = NormalizeData(seurat_obj)
all.genes <- rownames(seurat_obj)
seurat_obj <- ScaleData(seurat_obj, features = all.genes)
seurat_obj <- FindVariableFeatures(object = seurat_obj)
seurat_obj <- RunPCA(seurat_obj, features = VariableFeatures(object = seurat_obj))
# Output exceeds the size limit. Open the full output data in a text editor
# Warning message:
# "Feature names cannot have underscores ('_'), replacing with dashes ('-')"
# Centering and scaling data matrix
# PC_ 1
# Positive: CALD1, EMP2, SPARC, CAV1, EMP1, ACTN1, KRT17, BGN, DSP, RAB13
# RND3, S100A16, KRT14, S100A14, S100A2, CD9, FHL2, IER3, GJB2, TRIM29
# TSC22D1, KRT15, SFN, FSTL1, DDR1, IGFBP7, JUP, PTRF, KRT5, FOSL1
# Negative: CD74, CRIP1, RARRES3, RGS1, LAT, CD69, NKG7, HLA-DPA1, TNFRSF4, CD27
# GZMA, HLA-DRB1, CXCR6, CTSW, GPR183, LDLRAD4, ICOS, HIST1H4C, GZMK, AC092580.4
# SLC9A3R1, CCR7, S100A4, HLA-DPB1, HMGB2, GBP5, SELL, CXCR3, LAG3, HLA-DRB5
# PC_ 2
# Positive: DSP, DSC3, SFN, SERPINB5, KRT17, S100A14, KRT15, KRT16, IRF6, TRIM29
# KRT14, LGALS7B, JUP, PERP, TACSTD2, GJB2, KRT5, KRT6B, DDR1, S100A2
# PKP1, CDH3, KRT6A, FXYD3, GJB3, MPZL2, CXADR, DSC2, DSG3, GRHL3
# Negative: COL1A2, COL3A1, LUM, COL6A1, MXRA8, FN1, CTSK, COL1A1, COL6A3, DCN
# MMP2, COL6A2, PRRX1, FKBP10, TNFAIP6, FAP, PCOLCE, PDGFRB, NNMT, AEBP1
# C1S, CCDC80, SFRP2, RCN3, PDPN, SERPINF1, COL5A2, CTHRC1, COL5A1, COL12A1
# PC_ 3
# Positive: LYZ, TYROBP, SPI1, FCER1G, KYNU, C15orf48, BCL2A1, HLA-DRA, CD68, AIF1
# CST3, CTSZ, SERPINA1, CSF2RA, HLA-DRB5, SLC7A11, LST1, MS4A7, ALDH2, FAM49A
# IFI30, HLA-DRB1, GPR157, PLEK, HLA-DPB1, IL1B, CD86, CCDC88A, HLA-DPA1, RNF144B
# Negative: MT2A, MT1X, BGN, MT1E, COL6A2, COL1A2, COL6A1, COL5A2, MXRA8, DCN
# COL12A1, COL3A1, COL1A1, LUM, MFAP2, PCOLCE, MT1F, MMP2, COL5A1, COL6A3
# PRRX1, C1S, AEBP1, CTSK, FAP, LAT, PDGFRB, RARRES3, THY1, EFEMP2
# GJB3, LYPD3, GRHL3, PVRL4, KRT16, TACSTD2, GJB5, SERPINB13, MPZL2, KRT23
# Negative: UBE2C, GTSE1, BIRC5, RRM2, CCNA2, TYMS, TOP2A, TK1, DLGAP5, MKI67
# PKMYT1, CENPA, KIFC1, CDCA5, UHRF1, ASF1B, AURKB, FAM111B, TROAP, CKAP2L
# HJURP, ESCO2, FOXM1, CDK1, ZWINT, E2F2, CLSPN, HIST1H1B, CDT1, MCM10
# By default, the pipeline will do the clustering using the SeuratClustering process
# If you want to do the clustering yourself, you can set `is_seurat = true`
# when running the pipeline, which will skip the clustering step.
seurat_obj <- FindNeighbors(seurat_obj, dims = 1:10)
seurat_obj <- FindClusters(seurat_obj, resolution = 0.5)
head(Idents(seurat_obj))
# Computing nearest neighbor graph
# Computing SNN
# Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
# Number of nodes: 1000
# Number of edges: 30631
# Running Louvain algorithm...
# Maximum modularity in 10 random starts: 0.8795
# Number of communities: 9
# Elapsed time: 0 seconds
# bcc.su009.post.tcell_CCATTCGCAATCTACG 0
# bcc.su004.pre.tcell_ACAGCTACACTTCTGC 0
# bcc.su006.pre.tcell_CGTGTAAAGTGACTCT 1
# bcc.su001.post.tcell_CCTTTCTGTACCGTTA 2
# bcc.su007.pre.tcell_ATTTCTGAGAAGGACA 1
# bcc.su009.pre.tcell_AGACGTTTCCTGCAGG8 8
# Levels:
# '0''1''2''3''4''5''6''7''8'
# Check the meta.data
head(seurat_obj@meta.data)
# orig.ident nCount_RNA nFeature_RNA patient treatment sort cluster UMAP1 UMAP2 RNA_snn_res.0.5 seurat_clusters
# <fct> <dbl> <int> <chr> <chr> <chr> <chr> <dbl> <dbl> <fct> <fct>
# bcc.su009.post.tcell_CCATTCGCAATCTACG bcc.su009.post.tcell 4242 1726 su009 post CD45+ CD3+ CD8_mem_T_cells -9.1816435 0.7484789 0 0
# bcc.su004.pre.tcell_ACAGCTACACTTCTGC bcc.su004.pre.tcell 3159 1466 su004 pre CD45+ CD3+ CD8_ex_T_cells 4.1067562 3.3754938 0 0
# bcc.su006.pre.tcell_CGTGTAAAGTGACTCT bcc.su006.pre.tcell 3279 1401 su006 pre CD45+ CD3+ Tregs 0.4816457 11.9388428 1 1
# bcc.su001.post.tcell_CCTTTCTGTACCGTTA bcc.su001.post.tcell 5057 2218 su001 post CD45+ CD3+ CD8_ex_T_cells 2.3045983 7.4856248 2 2
# bcc.su007.pre.tcell_ATTTCTGAGAAGGACA bcc.su007.pre.tcell 3701 1413 su007 pre CD45+ CD3+ CD8_mem_T_cells -1.9617293 5.5546365 1 1
# bcc.su009.pre.tcell_AGACGTTTCCTGCAGG bcc.su009.pre.tcell 3891 1593 su009 pre CD45+ CD3+ Tcell_prolif 5.2243228 -1.0460106 8 8
# save seurat object
saveRDS(seurat_obj, "test_data/scrna_metabolic/seurat_obj.rds")
Prepare the pathway file
A set of collected metabolic pathways can be found here:
https://github.com/LocasaleLab/Single-Cell-Metabolic-Landscape/blob/master/Data/KEGG_metabolism.gmt
Download and save it to test_data/scrna_metabolic/KEGG_metabolism.gmt
Prepare the configuration file
Save at test_data/scrna_metabolic/config.toml
:
# Pipeline configuration
[ScrnaMetabolicLandscape]
metafile = "test_data/scrna_metabolic/seurat_obj.rds"
gmtfile = "test_data/scrna_metabolic/KEGG_metabolism.gmt"
group_by = "seurat_clusters"
subset_by = "timepoint"
[ScrnaMetabolicLandscape.mutaters]
timepoint = "if_else(patient != 'su001', NA_character_, treatment)"
# Optional: Configure individual processes
[ScrnaMetabolicLandscape.MetabolicPathwayActivity.envs]
ntimes = 1000
[ScrnaMetabolicLandscape.MetabolicFeatures.envs.plots]
"Summary Plot" = { plot_type = "summary", top_term = 5 }
Run the pipeline
pipen run scrna_metabolic_landscape ScrnaMetabolicLandscape --config test_data/scrna_metabolic/config.toml
Check out the results/reports
The results can be found at ./ScrnaMetabolicLandscape_results/
, and reports can be found at ./ScrnaMetabolicLandscape_results/REPORTS
. To check out the reports, open ./ScrnaMetabolicLandscape_results/REPORTS/index.html
in your browser.
There are 3 parts of the results:
-
MetabolicPathwayActivity
:The pathway activities for groups (defined by
group_by
in the configuration) for each subset (defined bysubset_by
) -
MetabolicPathwayHeterogeneity
:Pathway heterogeneity for groups for each subset
-
MetabolicFeatures
:The pathway enrichment analysis in detail for each group against the rest of the groups in each subset.