Preparing the input data

Single-cell RNA-seq (scRNA-seq) data

Currently only 10X Genomics data are supported.

For each sample, you need to provide a directory containing the files that are generated by CellRanger. Specifically, the directory should be able to be read by Seurat::Read_10X(). For example, the directory should contain matrix.mtx, barcodes.tsv and features.tsv. These files can also be gzipped.

You can also use the h5 file generated by CellRanger.

Single-cell TCR-seq (scTCR-seq) data

The scRNA-seq data is optional for the pipeline. However, the scTCR-seq data is required for the pipeline.

The scTCR-seq data, if available, should be paired with the scRNA-seq data. Theoratically, as long as the data can be loaded by immunarch::repLoad(), it should be fine. However, the pipeline is only tested with the data generated by CellRanger. Specifically, the directory should contain filtered_contig_annotations.csv file, or at least the all_contig_annotations.csv file.

Metadata

A metadata file is required as an input file for the pipeline. It should be a TAB delimited file with 3 required columns:

  • Sample: A unique id for each sample
  • RNAData: The directory or h5 file for single-cell RNA data for this sample, as described above.
  • TCRData (optional): The directory for single-cell TCR data for this sample as described above.

When TCRData is not provided, the pipeline will skip the processes related to scTCR-seq data (see Routes of the pipeline for more details). You can also add other columns to the metadata file. The columns will be added to both:

  • meta data of the object loaded by immunarch::repLoad() (i.e. data$meta)
  • meta data of the seurat object loaded by Seurat::Read10X() or Seurat::Read10X_h5() (i.e. srtobj@meta.data)

This file should be provided to SampleInfo process. See SampleInfo for more details.

An example metadata file can be found here.

You can also use SampleInfo with envs.save_mutated = true and/or IntegratingTCR to add columns to metadata by configuration. These columns are persisted for downstream analysis. The difference is that SampleInfo can only use the columns that are already in the metadata file, while IntegratingTCR can use the columns that are generated by the pipeline (e.g. TCR clone information).

Other optional files

Genes/Features to visualize for Seurat object

If you have a set of genes/features of interest, you can provide a file with those genes, one gene per line, to SeuratClusterStats.envs.exprs.features for:

Note

The genes should exist in the RNA-seq data (i.e features.tsv or the h5 file from cellranger).

See SeuratClusterStats for more details.

Pathways for Gene Set Enrichment Analysis (GSEA)

If you want to perform GSEA, you need to provide a file containing the pathways. The file should be in the GMT format. You can provide the file to ScFGSEA.envs.gmtfile. Similarly, the genes should exist (be in the same format) in the features.tsv file.

See ScFGSEA for more details.

You can also find an example here: https://github.com/pwwang/immunopipe-example/blob/master/data/MSigDB_Hallmark_v7.5.1.gmt

Cell type database for cell type annotation by sctype or hitype

If you want to perform cell type annotation, you need to provide a file containing the cell type database if you are using sctype or hitype. The database file should be fed to CellTypeAnnotation.envs.sctype_db if you are using sctype, or CellTypeAnnotation.envs.hitype_db if you are using hitype. Again, the markers in the database should exist (be in the same format) in the features.tsv file or the h5 file.

See CellTypeAnnotation for more details.

Examples can be found here: ScTypeDB_short.xlsx and ScTypeDB_full.xlsx.

Model for cell type annotation by celltypist

If you want to perform cell type annotation by celltypist, you need to provide a model file. The model file should be fed to CellTypeAnnotation.envs.celltypist_args.model. The information of models can be found here. Download the one you want to use and provide the path to the file.

Metabolic pathway for Metabolic Landscape Analysis

Similarly, if you want to perform metabolic landscape analysis, you need to provide a file containing the metabolic pathways. The file should be in the GMT format. You can provide the file to ScMetabolicLandscape.envs.gmtfile. This file can also be used for GSEA. A pathway file for KEGG metabolism is provided here.

See ScrnaMetabolicLandscape for more details.

Reference for Seurat mapping if you want to perform supervised clustering

If you want to perform supervised clustering, you need to provide a reference for SeuratMap2Ref. The reference should be a Seurat object in RDS or h5seurat file. You can provide the reference to SeuratMap2Ref.envs.ref.

See SeuratMap2Ref for more details.