Preparing the input data¶
Single-cell RNA-seq (scRNA-seq) data¶
Currently only 10X Genomics data are supported.
For each sample, you need to provide a directory containing the files that are generated by CellRanger
. Specifically, the directory should be able to be read by Seurat::Read_10X()
. For example, the directory should contain matrix.mtx
, barcodes.tsv
and features.tsv
.
These files can also be gzipped.
You can also use the h5
file generated by CellRanger
.
Single-cell TCR-seq (scTCR-seq) data¶
The scRNA-seq data is optional for the pipeline. However, the scTCR-seq data is required for the pipeline.
The scTCR-seq data, if available, should be paired with the scRNA-seq data. Theoratically, as long as the data can be loaded by immunarch::repLoad()
, it should be fine. However, the pipeline is only tested with the data generated by CellRanger
. Specifically, the directory should contain filtered_contig_annotations.csv
file, or at least the all_contig_annotations.csv
file.
Metadata¶
A metadata file is required as an input file for the pipeline. It should be a TAB
delimited file with 3 required columns:
Sample
: A unique id for each sampleRNAData
: The directory orh5
file for single-cell RNA data for this sample, as described above.TCRData
(optional): The directory for single-cell TCR data for this sample as described above.
When TCRData
is not provided, the pipeline will skip the processes related to scTCR-seq data (see Routes of the pipeline for more details).
You can also add other columns to the metadata file. The columns will be added to both:
- meta data of the object loaded by
immunarch::repLoad()
(i.e.data$meta
) - meta data of the seurat object loaded by
Seurat::Read10X()
orSeurat::Read10X_h5()
(i.e.srtobj@meta.data
)
This file should be provided to SampleInfo
process. See SampleInfo
for more details.
An example metadata file can be found here.
You can also use SampleInfo
with envs.save_mutated = true
and/or IntegratingTCR
to add columns to metadata by configuration. These columns are persisted for downstream analysis. The difference is that SampleInfo
can only use the columns that are already in the metadata file, while IntegratingTCR
can use the columns that are generated by the pipeline (e.g. TCR clone information).
Other optional files¶
Genes/Features to visualize for Seurat object¶
If you have a set of genes/features of interest, you can provide a file with those genes, one gene per line, to SeuratClusterStats.envs.exprs.features
for:
- Ridge plots using
Seurat::RidgePlot()
, - Violin plots using
Seurat::VlnPlot()
, - Feature plots using
Seurat::FeaturePlot()
, - Dot plots using
Seurat::DotPlot()
, and - Heatmaps using
Seurat::DoHeatmap()
.
Note
The genes should exist in the RNA-seq data (i.e features.tsv
or the h5
file from cellranger).
See SeuratClusterStats
for more details.
Pathways for Gene Set Enrichment Analysis (GSEA)¶
If you want to perform GSEA, you need to provide a file containing the pathways. The file should be in the GMT format. You can provide the file to ScFGSEA.envs.gmtfile
. Similarly, the genes should exist (be in the same format) in the features.tsv
file.
See ScFGSEA
for more details.
You can also find an example here: https://github.com/pwwang/immunopipe-example/blob/master/data/MSigDB_Hallmark_v7.5.1.gmt
Cell type database for cell type annotation by sctype
or hitype
¶
If you want to perform cell type annotation, you need to provide a file containing the cell type database if you are using sctype
or hitype
. The database file should be fed to CellTypeAnnotation.envs.sctype_db
if you are using sctype
, or CellTypeAnnotation.envs.hitype_db
if you are using hitype
. Again, the markers in the database should exist (be in the same format) in the features.tsv
file or the h5
file.
See CellTypeAnnotation
for more details.
Examples can be found here: ScTypeDB_short.xlsx and ScTypeDB_full.xlsx.
Model for cell type annotation by celltypist
¶
If you want to perform cell type annotation by celltypist
, you need to provide a model file. The model file should be fed to CellTypeAnnotation.envs.celltypist_args.model
. The information of models can be found here. Download the one you want to use and provide the path to the file.
Metabolic pathway for Metabolic Landscape Analysis¶
Similarly, if you want to perform metabolic landscape analysis, you need to provide a file containing the metabolic pathways. The file should be in the GMT format. You can provide the file to ScMetabolicLandscape.envs.gmtfile
. This file can also be used for GSEA. A pathway file for KEGG metabolism is provided here.
See ScrnaMetabolicLandscape
for more details.
Reference for Seurat mapping if you want to perform supervised clustering¶
If you want to perform supervised clustering, you need to provide a reference for SeuratMap2Ref
. The reference should be a Seurat
object in RDS
or h5seurat
file. You can provide the reference to SeuratMap2Ref.envs.ref
.
See SeuratMap2Ref
for more details.