Prepare gene sets for hitype
gs_prepare.Rd
Prepare gene sets for hitype
Arguments
- path_to_db_file
A data frame with markers or Path to the marker gene database file, it should be a tab-delimited text or excel file with the following columns:
tissueType
: The tissue type of the cell type to be annotated. This column is required only ifcell_type
is specified.cellName
: The name of the cell type to be annotated.nextLevels
: Possible next levels of the cell type to be annotated. Introduced byhitype
, so that we can work with hierarchical cell. Levels are separated by;
. Cell names at each level are separated by,
. An exclamatory mark!
at the beginning of a level means that the cell names at this level are mutually exclusive. If the levels are less than possible next levels, then the remaining levels are all possible next levels. See the example below.geneSymbolmore1
: The gene symbols of the marker genes that are expected to be expressed in the cell type to be annotated. The genes can be suffixed with one or more+
. More+
means higher expression level. For example,CD3E++
means the geneCD3E
is expected to be highly expressed in the cell type.geneSymbolmore2
: The gene symbols of the marker genes that are expected not to be expressed in the cell type to be annotated.level
: The levels of the cell names. Introduced byhitype
, so that we can work with hierarchical cell names. Different levels ofcellName
s are predicted separately. For example, If we haveCD4
as level 1 andNaive
as level 2, then our prediction for a cell type could beCD4 Naive
. The levels should start from 1 and be consecutive. #'
- tissue_type
The tissue type of the cell type to be annotated. This requires the
tissueType
column in the marker gene database file. Iftissue_type
is specified, then only the cell types in the specified tissue type will be used for annotation. Iftissue_type
is not specified, then all cell types in the marker gene database file will be used for annotation.
Examples
# nextLevels example
# If we have the following cell types:
# level cellName nextLevels
# 1 CD4 Naive,Memory
# 2 Naive
# 2 Memory
# 3 Activated
# 3 Proliferating
# Then possible final cell names are:
# CD4 Naive Activated
# CD4 Naive Proliferating
# CD4 Memory Activated
# CD4 Memory Proliferating
#
# If the `nextLevels` of CD4 is `!Naive`, then possible final cell names are:
# CD4 Memory Activated
# CD4 Memory Proliferating
#
# If the `nextLevels` of CD4 is `!`, then possible final cell names are:
# CD4 Activated
# CD4 Proliferating
#
# If the `nextLevels` of CD4 is `!Naive;!`, then possible final cell names
# are:
# CD4 Memory