CellTypeAnnotation

Annotate the T cell clusters.

Annotate the cell clusters. Currently, four ways are supported:

  1. Pass the cell type annotation directly
  2. Use ScType
  3. Use scCATCH
  4. Use hitype

The annotated cell types will replace the original seurat_clusters column in the metadata, so that the downstream processes will use the annotated cell types.

The old seurat_clusters column will be renamed to seurat_clusters_id.

If you are using ScType, scCATCH, or hitype, a text file containing the mapping from the old seurat_clusters to the new cell types will be generated and saved to cluster2celltype.tsv under <workdir>/<pipline_name>/CellTypeAnnotation/0/output/.

The <workdir> is typically ./.pipen and the <pipline_name> is Immunopipe by default.

Note

When supervised clustering SeuratMap2Ref is used, this process will be ignored.

Note

When cell types are annotated, the old seurat_clusters column will be renamed to seurat_clusters_id, and the new seurat_clusters column will be added.

Environment Variables

  • tool (choice): Default: direct.
    The tool to use for cell type annotation.
  • sctype_tissue: The tissue to use for sctype.
    Avaiable tissues should be the first column (tissueType) of sctype_db.
    If not specified, all rows in sctype_db will be used.
  • sctype_db: The database to use for sctype.
    Check examples at https://github.com/IanevskiAleksandr/sc-type/blob/master/ScTypeDB_full.xlsx
  • hitype_tissue: The tissue to use for hitype.
    Avaiable tissues should be the first column (tissueType) of hitype_db.
    If not specified, all rows in hitype_db will be used.
  • hitype_db: The database to use for hitype.
    Compatible with sctype_db.
    See also https://pwwang.github.io/hitype/articles/prepare-gene-sets.html You can also use built-in databases, including hitypedb_short, hitypedb_full, and hitypedb_pbmc3k.
  • cell_types (list): Default: [].
    The cell types to use for direct annotation.
    You can use "-" or "" as the placeholder for the clusters that you want to keep the original cell types (seurat_clusters).
    If the length of cell_types is shorter than the number of clusters, the remaining clusters will be kept as the original cell types.
    You can also use NA to remove the clusters from downstream analysis. This only works when envs.newcol is not specified.

    Note

    If tool is direct and cell_types is not specified or an empty list, the original cell types will be kept and nothing will be changed.

  • sccatch_args (ns): The arguments for scCATCH::findmarkergene() if tool is sccatch.

    • species: The specie of cells.
    • cancer: Default: Normal.
      If the sample is from cancer tissue, then the cancer type may be defined.
    • tissue: Tissue origin of cells must be defined.
    • marker: The marker genes for cell type identification.
    • if_use_custom_marker (flag): Default: False.
      Whether to use custom marker genes. If True, no species, cancer, and tissue are needed.
    • <more>: Other arguments for scCATCH::findmarkergene().
      You can pass an RDS file to sccatch_args.marker to work as custom marker. If so, if_use_custom_marker will be set to TRUE automatically.
  • celltypist_args (ns): The arguments for celltypist::celltypist() if tool is celltypist.
    • model: The path to model file.
    • python: Default: python.
      The python path where celltypist is installed.
    • majority_voting: Default: True.
      When true, it refines cell identities within local subclusters after an over-clustering approach at the cost of increased runtime.
    • over_clustering (type=auto): Default: seurat_clusters.
      The column name in metadata to use as clusters for majority voting.
      Set to False to disable over-clustering.
    • assay: When converting a Seurat object to AnnData, the assay to use.
      If input is h5seurat, this defaults to RNA.
      If input is Seurat object in RDS, this defaults to the default assay.
  • merge (flag): Default: False.
    Whether to merge the clusters with the same cell types.
    Otherwise, a suffix will be added to the cell types (ie. .1, .2, etc).
  • newcol: The new column name to store the cell types.
    If not specified, the seurat_clusters column will be overwritten.
    If specified, the original seurat_clusters column will be kept and Idents will be kept as the original seurat_clusters.
  • outtype (choice): Default: input.
    The output file type. Currently only works for celltypist.
    An RDS file will be generated for other tools.
    • input: Use the same file type as the input.
    • rds: Use RDS file.
    • h5seurat: Use h5seurat file.
    • h5ad: Use AnnData file.

Examples

[CellTypeAnnotation.envs]
tool = "direct"
cell_types = ["CellType1", "CellType2", "-", "CellType4"]

The cell types will be assigned as:

0 -> CellType1
1 -> CellType2
2 -> 2
3 -> CellType4

Metadata

When envs.tool is direct and envs.cell_types is empty, the metadata of the Seurat object will be kept as is.

When envs.newcol is specified, the original seurat_clusters column will be kept is, and the annotated cell types will be saved in the new column.
Otherwise, the original seurat_clusters column will be replaced by the annotated cell types and the original seurat_clusters column will be saved at seurat_clusters_id.

CellTypeAnnotation-metadata