SeuratClusterStats

SeuratClusterStats¶

Statistics of the clustering.

Including the number/fraction of cells in each cluster, the gene expression values and dimension reduction plots. It's also possible to perform stats on TCR clones/clusters or other metadata for each T-cell cluster.

Input¶

srtobj: The seurat object loaded by SeuratClustering

Output¶

outdir: Default: {{in.srtobj | stem}}.cluster_stats.
The output directory

Environment Variables¶

mutaters (type=json): Default: {}.
The mutaters to mutate the metadata to subset the cells.
The mutaters will be applied in the order specified.
clustrees_defaults (ns): The parameters for the clustree plots.
- devpars (ns): The device parameters for the clustree plot.
  - res (type=int): Default: 100.
    The resolution of the plots.
  - height (type=int): Default: 1000.
    The height of the plots.
  - width (type=int): Default: 800.
    The width of the plots.
- prefix: Default: _auto.
  string indicating columns containing clustering information.
  The trailing dot is not necessary and will be added automatically.
  When _auto, clustrees will be plotted when there is FindClusters or FindClusters.* in the obj@commands.
  The latter is generated by SeuratSubClustering.
  This will be ignored when envs.clustrees is specified.
- <more>: Other arguments passed to clustree::clustree().
  See https://rdrr.io/cran/clustree/man/clustree.html
clustrees (type=json): Default: {}.
The cases for clustree plots.
Keys are the names of the plots and values are the dicts inherited from env.clustrees_defaults except prefix.
There is no default case for clustrees.
hists_defaults (ns): The default parameters for histograms.
This will plot histograms for the number of cells along x.
For example, you can plot the number of cells along cell activity score.
- x: The column name in metadata to plot as the x-axis.
  The NA values will be removed.
  It could be either numeric or factor/character.
- x_order (list): Default: [].
  The order of the x-axis, only works for factor/character x.
  You can also use it to subset x (showing only a subset values of x).
- cells_by: A column name in metadata to group the cells.
  The NA values will be removed. It should be a factor/character.
  if not specified, all cells will be used.
- cells_order (list): Default: [].
  The order of the cell groups for the plots.
  It should be a list of strings. You can also use cells_orderby and cells_n to determine the order.
- cells_orderby: An expression passed to dplyr::arrange() to order the cell groups.
- cells_n: Default: 10.
  The number of cell groups to show.
  Ignored if cells_order is specified.
- ncol (type=int): Default: 2.
  The number of columns for the plots, split by cells_by.
- subset: An expression to subset the cells, will be passed to dplyr::filter().
- each: Whether to plot each group separately.
- bins: Default: 30.
  The number of bins to use, only works for numeric x.
- plus (list): Default: [].
  The extra elements to add to the ggplot object.
- devpars (ns): The device parameters for the plots.
  - res (type=int): Default: 100.
    The resolution of the plots.
  - height (type=int): The height of the plots.
  - width (type=int): The width of the plots.
hists (type=json): Default: {}.
The cases for histograms.
Keys are the names of the plots and values are the dicts inherited from env.hists_defaults.
There is no default case.
stats_defaults (ns): The default parameters for stats.
This is to do some basic statistics on the clusters. For more comprehensive analysis, see RadarPlots and CellsDistribution.
The parameters from the cases can overwrite the default parameters.
- frac (choice): Default: none.
  How to calculate the fraction of cells.
  - group: calculate the fraction in each group.
    The total fraction of the cells of idents in each group will be 1.
    When group-by is not specified, it will be the same as all.
  - ident: calculate the fraction in each ident.
    The total fraction of the cells of groups in each ident will be 1.
    Only works when group-by is specified.
  - cluster: alias of ident.
  - all: calculate the fraction against all cells.
  - none: do not calculate the fraction, use the number of cells instead.
- pie (flag): Default: False.
  Also output a pie chart?
- circos (flag): Default: False.
  Also output a circos plot?
- table (flag): Default: False.
  Whether to output a table (in tab-delimited format) and in the report.
- transpose (flag): Default: False.
  Whether to transpose the cluster and group, that is, using group as the x-axis and cluster to fill the plot.
  For circos plot, when transposed, the arrows will be drawn from the idents (by ident) to the the groups (by group-by).
  Only works when group-by is specified.
- position (choice): Default: auto.
  The position of the bars. Does not work for pie and circos plots.
  - stack: Use position_stack().
  - fill: Use position_fill().
  - dodge: Use position_dodge().
  - auto: Use stack when there are more than 5 groups, otherwise use dodge.
- ident: Default: seurat_clusters.
  The column name in metadata to use as the identity.
- group-by: The column name in metadata to group the cells.
  Does NOT support for pie charts.
- split-by: The column name in metadata to split the cells into different plots.
  Does NOT support for circos plots.
- subset: An expression to subset the cells, will be passed to dplyr::filter() on metadata.
- circos_labels_rot (flag): Default: False.
  Whether to rotate the labels in the circos plot.
  In case the labels are too long.
- circos_devpars (ns): The device parameters for the circos plots.
  - res (type=int): Default: 100.
    The resolution of the plots.
  - height (type=int): Default: 600.
    The height of the plots.
  - width (type=int): Default: 600.
    The width of the plots.
- pie_devpars (ns): The device parameters for the pie charts.
  - res (type=int): Default: 100.
    The resolution of the plots.
  - height (type=int): Default: 600.
    The height of the plots.
  - width (type=int): Default: 800.
    The width of the plots.
- devpars (ns): The device parameters for the plots.
  - res (type=int): Default: 100.
    The resolution of the plots.
  - height (type=int): Default: 600.
    The height of the plots.
  - width (type=int): Default: 800.
    The width of the plots.
stats (type=json): Default: {'Number of cells in each cluster': Diot({'pie': True}), 'Number of cells in each cluster by Sample': Diot({'group-by': 'Sample', 'table': True, 'frac': 'group'})}.
The number/fraction of cells to plot.
Keys are the names of the plots and values are the dicts inherited from env.stats_defaults.
Here are some examples -
```
{
    "nCells_All": {},
    "nCells_Sample": {"group-by": "Sample"},
    "fracCells_Sample": {"frac": True, "group-by": "Sample"},
}
```
ngenes_defaults (ns): The default parameters for ngenes.
The default parameters to plot the number of genes expressed in each cell.
- ident: Default: seurat_clusters.
  The column name in metadata to use as the identity.
- group-by: The column name in metadata to group the cells.
  Dodge position will be used to separate the groups.
- split-by: The column name in metadata to split the cells into different plots.
- subset: An expression to subset the cells, will be passed to tidyrseurat::filter().
- devpars (ns): The device parameters for the plots.
  - res (type=int): Default: 100.
    The resolution of the plots.
  - height (type=int): Default: 800.
    The height of the plots.
  - width (type=int): Default: 1000.
    The width of the plots.
ngenes (type=json): Default: {'Number of genes expressed in each cluster': Diot({})}.
The number of genes expressed in each cell.
Keys are the names of the plots and values are the dicts inherited from env.ngenes_defaults.
features_defaults (ns): The default parameters for features.
- features: The features to plot.
  It can be either a string with comma separated features, a list of features, a file path with file:// prefix with features (one per line), or an integer to use the top N features from VariantFeatures(srtobj).
- ident: Default: seurat_clusters.
  The column name in metadata to use as the identity.
  If it is from subclustering (reduction sub_umap_<ident> exists), the reduction will be used.
- cluster_orderby (type=auto): The order of the clusters to show on the plot.
  An expression passed to dplyr::summarise() on the grouped data frame (by seurat_clusters).
  The summary stat will be passed to dplyr::arrange() to order the clusters. It's applied on the whole meta.data before grouping and subsetting.
  For example, you can order the clusters by the activation score of the cluster: desc(mean(ActivationScore, na.rm = TRUE)), suppose you have a column ActivationScore in the metadata.
  You may also specify the literal order of the clusters by a list of strings.
- subset: An expression to subset the cells, will be passed to tidyrseurat::filter().
- devpars (ns): The device parameters for the plots. Does not work for table.
  - res (type=int): Default: 100.
    The resolution of the plots.
  - height (type=int): The height of the plots.
  - width (type=int): The width of the plots.
- plus: The extra elements to add to the ggplot object. Does not work for table.
- group-by: Group cells in different ways (for example, orig.ident). Works for ridge, vln, and dot.
  It also works for feature as shape.by being passed to Seurat::FeaturePlot.
- split-by: The column name in metadata to split the cells into different plots.
  It works for vln, feature, and dot.
- assay: The assay to use.
- layer: The layer to use.
- reduction: The reduction to use. Only works for feature.
- section: The section to put the plot in the report.
  If not specified, the case title will be used.
- ncol (type=int): Default: 2.
  The number of columns for the plots.
- kind (choice): The kind of the plot or table.
  - ridge: Use Seurat::RidgePlot.
  - ridgeplot: Same as ridge.
  - vln: Use Seurat::VlnPlot.
  - vlnplot: Same as vln.
  - violin: Same as vln.
  - violinplot: Same as vln.
  - feature: Use Seurat::FeaturePlot.
  - featureplot: Same as feature.
  - dot: Use Seurat::DotPlot.
  - dotplot: Same as dot.
  - bar: Bar plot on an aggregated feature.
    The features must be a single feature, which will be either an existing feature or an expression passed to dplyr::summarise() (grouped by ident) on the existing features to create a new feature.
  - barplot: Same as bar.
  - heatmap: Use Seurat::DoHeatmap.
  - avgheatmap: Plot the average expression of the features in each cluster as a heatmap.
  - table: The table for the features, only gene expressions are supported.
    (supported keys: ident, subset, and features).
features (type=json): Default: {}.
The plots for features, include gene expressions, and columns from metadata.
Keys are the titles of the cases and values are the dicts inherited from env.features_defaults. It can also have other parameters from each Seurat function used by kind. Note that for argument name with ., you should use - instead.
dimplots_defaults (ns): The default parameters for dimplots.
- ident: Default: seurat_clusters.
  The identity to use.
  If it is from subclustering (reduction sub_umap_<ident> exists), this reduction will be used if reduction is set to dim or auto.
- group-by: Same as ident if not specified, to define how the points are colored.
- na_group: The group name for NA values, use None to ignore NA values.
- split-by: The column name in metadata to split the cells into different plots.
- shape-by: The column name in metadata to use as the shape.
- subset: An expression to subset the cells, will be passed to tidyrseurat::filter().
- devpars (ns): The device parameters for the plots.
  - res (type=int): Default: 100.
    The resolution of the plots.
  - height (type=int): Default: 800.
    The height of the plots.
  - width (type=int): Default: 1000.
    The width of the plots.
- reduction (choice): Default: dim.
  Which dimensionality reduction to use.
  - dim: Use Seurat::DimPlot.
    First searches for umap, then tsne, then pca.
    If ident is from subclustering, sub_umap_<ident> will be used.
  - auto: Same as dim
  - umap: Use Seurat::UMAPPlot.
  - tsne: Use Seurat::TSNEPlot.
  - pca: Use Seurat::PCAPlot.
- <more>: See https://satijalab.org/seurat/reference/dimplot
dimplots (type=json): Default: {'Dimensional reduction plot': Diot({'label': True, 'label-box': True, 'repel': True}), 'TCR presence': Diot({'ident': 'TCR_Presence', 'order': 'TCR_absent', 'cols': ['#FF000066', 'gray']})}.
The dimensional reduction plots.
Keys are the titles of the plots and values are the dicts inherited from env.dimplots_defaults. It can also have other parameters from Seurat::DimPlot.

Examples¶

Number of cells in each cluster¶

[SeuratClusterStats.envs.stats]
# suppose you have nothing set in `envs.stats_defaults`
# otherwise, the settings will be inherited here
nCells_All = { }

nCells_All

Number of cells in each cluster by groups¶

[SeuratClusterStats.envs.stats]
nCells_Sample = { group-by = "Sample" }

nCells_Sample

Violin plots for the gene expressions¶

[SeuratClusterStats.envs.features]
features = "CD4,CD8A"
# Remove the dots in the violin plots
vlnplots = { pt-size = 0, kind = "vln" }
# Don't use the default genes
vlnplots_1 = { features = ["FOXP3", "IL2RA"], pt-size = 0, kind = "vln" }

vlnplots vlnplots_1

Dimension reduction plot with labels¶

[SeuratClusterStats.envs.dimplots.Idents]
label = true
label-box = true
repel = true

dimplots