CellTypeAnnotation

Annotate all or selected T/B cell clusters.

Annotate the cell clusters. Currently, four ways are supported:

  1. Pass the cell type annotation directly
  2. Use ScType
  3. Use scCATCH
  4. Use hitype

The annotated cell types will replace the original identity column in the metadata, so that the downstream processes will use the annotated cell types.

Note

When cell types are annotated, the original identity column (e.g. seurat_clusters) will be renamed to envs.backup_col (e.g. seurat_clusters_id), and the new identity column will be added.

If you are using ScType, scCATCH, or hitype, a text file containing the mapping from the original identity to the new cell types will be generated and saved to cluster2celltype.tsv under <workdir>/<pipline_name>/CellTypeAnnotation/0/output/.

The <workdir> is typically ./.pipen and the <pipline_name> is Immunopipe by default.

Note

If you have other annotation processes, including SeuratClustering process or SeuratMap2Ref process enabled in the same run, you may want to specify a different name for the column to store the annotated cell types using envs.newcol, so that the results from different annotation processes won't overwrite each other.

Input

  • sobjfile: The single-cell object in RDS/qs/qs2/h5ad format.

Output

  • outfile: Default: {{in.sobjfile | stem}}.annotated.{{- ext0(in.sobjfile) if envs.outtype == 'input' else envs.outtype -}}.
    The rds/qs/qs2/h5ad file of seurat object with cell type annotated.
    A text file containing the mapping from the old identity to the new cell types will be generated and saved to cluster2celltype.tsv under the job output directory.
    Note that if envs.ident is specified, the output Seurat object will have the identity set to the specified column in metadata.

Environment Variables

  • tool (choice): Default: direct.
    The tool to use for cell type annotation.
  • sctype_tissue: The tissue to use for sctype.
    Avaiable tissues should be the first column (tissueType) of sctype_db.
    If not specified, all rows in sctype_db will be used.
  • sctype_db: The database to use for sctype.
    Check examples at https://github.com/IanevskiAleksandr/sc-type/blob/master/ScTypeDB_full.xlsx
  • ident: The column name in metadata to use as the clusters.
    If not specified, the identity column will be used when input is rds/qs/qs2 (supposing we have a Seurat object).
    If input data is h5ad, this is required to run cluster-based annotation tools.
    For celltypist, this is a shortcut to set over_clustering in celltypist_args.
  • backup_col: Default: seurat_clusters_id.
    The backup column name to store the original identities.
    If not specified, the original identity column will not be stored.
    If envs.newcol is specified, this will be ignored.
  • hitype_tissue: The tissue to use for hitype.
    Avaiable tissues should be the first column (tissueType) of hitype_db.
    If not specified, all rows in hitype_db will be used.
  • hitype_db: The database to use for hitype.
    Compatible with sctype_db.
    See also https://pwwang.github.io/hitype/articles/prepare-gene-sets.html You can also use built-in databases, including hitypedb_short, hitypedb_full, and hitypedb_pbmc3k.
  • cell_types (list): Default: [].
    The cell types to use for direct annotation.
    You can use "-" or "" as the placeholder for the clusters that you want to keep the original cell types.
    If the length of cell_types is shorter than the number of clusters, the remaining clusters will be kept as the original cell types.
    You can also use NA to remove the clusters from downstream analysis. This only works when envs.newcol is not specified.

    Note

    If tool is direct and cell_types is not specified or an empty list, the original cell types will be kept and nothing will be changed.

  • more_cell_types (type=json): The additional cell type annotations to add to the metadata.
    The keys are the new column names and the values are the cell types lists.
    The cell type lists work the same as cell_types above.
    This is useful when you want to keep multiple annotations of cell types.

  • sccatch_args (ns): The arguments for scCATCH::findmarkergene() if tool is sccatch.

    • species: The specie of cells.
    • cancer: Default: Normal.
      If the sample is from cancer tissue, then the cancer type may be defined.
    • tissue: Tissue origin of cells must be defined.
    • marker: The marker genes for cell type identification.
    • if_use_custom_marker (flag): Default: False.
      Whether to use custom marker genes. If True, no species, cancer, and tissue are needed.
    • <more>: Other arguments for scCATCH::findmarkergene().
      You can pass an RDS file to sccatch_args.marker to work as custom marker. If so, if_use_custom_marker will be set to TRUE automatically.
  • celltypist_args (ns): The arguments for celltypist::celltypist() if tool is celltypist.
    • model: The path to model file.
    • python: Default: python.
      The python path where celltypist is installed.
    • majority_voting: Default: True.
      When true, it refines cell identities within local subclusters after an over-clustering approach at the cost of increased runtime.
    • over_clustering (type=auto): The column name in metadata to use as clusters for majority voting.
      Set to False to disable over-clustering.
      When in.sobjfile is rds/qs/qs2 (supposing we have a Seurat object), the default ident is used by default.
      Otherwise, it is False by default.
    • assay: When converting a Seurat object to AnnData, the assay to use.
      If input is h5seurat, this defaults to RNA.
      If input is Seurat object in RDS, this defaults to the default assay.
  • merge (flag): Default: False.
    Whether to merge the clusters with the same cell types.
    Otherwise, a suffix will be added to the cell types (ie. .1, .2, etc).
  • newcol: The new column name to store the cell types.
    If not specified, the identity column will be overwritten.
    If specified, the original identity column will be kept and Idents will be kept as the original identity.
  • outtype (choice): Default: input.
    The output file type. Currently only works for celltypist.
    An RDS file will be generated for other tools.
    • input: Use the same file type as the input.
    • rds: Use RDS file.
    • qs: Use qs2 file.
    • qs2: Use qs2 file.
    • h5ad: Use AnnData file.

Examples

[CellTypeAnnotation.envs]
tool = "direct"
cell_types = ["CellType1", "CellType2", "-", "CellType4"]

The cell types will be assigned as:

0 -> CellType1
1 -> CellType2
2 -> 2
3 -> CellType4

Metadata

When envs.tool is direct and envs.cell_types is empty, the metadata of the Seurat object will be kept as is.

When envs.newcol is specified, the original identity column (e.g. seurat_clusters) will be kept is, and the annotated cell types will be saved in the new column.
Otherwise, the original identity column will be replaced by the annotated cell types and the original identity column will be saved at envs.backup_col (e.g. seurat_clusters_id).

CellTypeAnnotation-metadata