Introduction¶
The pipeline architecture¶
immunopipe is built upon pipen. It is recommended to read the pipen docs first to get a better understanding of the pipeline.
Here, we just want to highlight some concepts that are helpful to use the pipeline as a user.
A process is a unit of work in the pipeline. immunopipe includes a set of processes. Some of them are reused from biopipen and some are written specifically for immunopipe.
The input of a process is typically a pandas DataFrame, which serves as the channel passing data between processes. The rows of the data frame are distributed to the jobs of the process, and columns are spreaded to the input variables of the job s. See more illustration here. In our case, most processes are just single-job processes. Other than the start processes, the input of a process is the output of other process(es). So users don't need to worry about the input of the processes in the configurations.
envs of a process is the most important part of immunopipe that a user needs to configure. It defines the environment variables of the process. The environment variables are shared by all the jobs of the process.
Attention
These environment variables are not the same as the environment variables of the system. They are just variables that are used in the process across its jobs.
See individual process pages for more details about the envs of each process.
Analyses and processes¶

As shown in the figure above, immunopipe includes a set of processes for scRNA-seq and scTCR-/scBCR-seq data analysis. The processes are grouped into categories below:
Data input and QC¶
SampleInfo: Read sample information from a CSV file and list the sample information in the report.ScRepLoading: Load the VDJ data intoScRepertoireobjects.SeuratPreparing: Read the data intoSeuratobjects and perform QC.
T cell selection¶
SeuratClusteringOfAllCells: Perform clustering on all cells if non-T cells are present in the data.ClusterMarkersOfAllCells: Find markers for each cluster of all the cells and perform enrichment analysis.TopExpressingGenesOfAllCells: Find top expressing genes for each cluster of all the cells and perform enrichment analysis.TOrBCellSelection: Select T cells from all cells.
Clustering of T cells¶
SeuratClustering: Perform clustering on all or T cells selected above.SeuratMap2Ref: Map the cells to a reference dataset.CellTypeAnnotation: Annotate cell types for each T-cell cluster.SeuratSubClustering: Perform sub-clustering on subsets of cells.ClusterMarkers: Find markers for each T-cell cluster and perform enrichment analysis.TopExpressingGenes: Find top expressing genes for each T-cell cluster and perform enrichment analysis.ModuleScoreCalculator: Calculate module scores or cell cycle scores for each cell.
Clonotype refinement¶
TCRClustering: Perform clustering on TCR clones based on CDR3 amino acid sequences.TESSA: Perform integrative analyses usingTessa.
Integration of scRNA-seq and scTCR-/scBCR-seq data¶
ScRepCombiningExpression: Combine the VDJ data with the expression data (into aSeuratobject).
Downstream analyses¶
SeuratClusterStats: Investigate statistics for each T-cell cluster (i.e. the number of cells in each cluster, the number of cells in each sample for each cluster, feature/gene expression visualization, dimension reduction plots, etc.). It's also possible to perform stats on TCR/BCR clones/clusters for each T-cell cluster.ClonalStats: Investigate statistics for clones.MarkersFinder: Find markers (differentially expressed genes) for any two groups, including clones or clone groups.PseudoBulkDEG: Perform pseudo-bulk differential expression analysis.CDR3AAPhyschem: Investigate the physicochemical properties of CDR3 amino acid sequences of one cell type over another (i.e.TregvsTconv).ScFGSEA: Perform GSEA analysis for comparisons between two groups of cells. For example, between two cell types, clone groups, TCR/BCR clusters or clinical groups.CellCellCommunication: Perform cell-cell communication analysis.CellCellCommunicationPlots: Generate plots for cell-cell communication analysis.
Metabolic landscape analyses¶
ScrnaMetabolicLandscape: A group of folowwing processes to perform metabolic landscape analyses.MetabolicInput: Prepare the input files for metabolic landscape analyses.MetabolicExprImpution: Impute the dropout values in the expression matrix.MetabolicPathwayActivity: Investigate the metabolic pathways of the cells in different groups and subsets.MetabolicPathwayHeterogeneity: Show metabolic pathways enriched in genes with highest contribution to the metabolic heterogeneities.MetabolicFeatures: Perform gene set enrichment analysis against the metabolic pathways for groups in different subsets.
Routes of the pipeline¶
immunopipe is designed to be flexible. It can be used in different ways. Here we list some common routes of the pipeline:
Both scRNA-seq and scTCR-/scBCR-seq data avaiable¶
To enable this route, you need to:
- tell the pipeline that scTCR-seq data is available by adding a column named
TCRData/BCRDatain the sample information file. - put the path of the sample information file in the configuration file
[SampleInfo.in.infile], instead of passing it as a command line argument (--Sample.in.infile).
Unsupervised clustering [SeuratClustering] on selected T cells is the default setting. If you want to perform supervised clustering, you need to add [SeuratMap2Ref] in the configuration file with necessary parameters. If so, SeuratClustering will be replaced by SeuratMap2Ref in the pipeline.
If you need to select T/B cells from all cells available for later analyses, you need to add [TOrBCellSelection] in the configuration file. If so, the processes annotated as something like For selected cells will be added to the pipeline.
This is the most common route of the pipeline:

The optional processes are enabled only when the corresponding sections are added in the configuration file. For example, if you want to add module scores (e.g. cell activation score) to the Seurat object, you need to add [ModuleScoreCalculator] in the configuration file.
Only scRNA-seq data avaiable¶
When you have only scRNA-seq data, you just don't need to add the TCRData/BCRData column in the sample information file. The pipeline will automatically skip the processes related to scTCR-/scBCR-seq data analysis.
Attention
You need to specify the sample information file in the configuration file [SampleInfo.in.infile] to enable this route. Passing the sample information file as a command line argument (--Sample.in.infile) does not trigger this route.
Unsupervised clustering [SeuratClustering] on selected T cells is the default setting. If you want to perform supervised clustering, you need to add [SeuratMap2Ref] in the configuration file with necessary parameters. If so, SeuratClustering will be replaced by SeuratMap2Ref in the pipeline.
