SeuratClusterStats

Statistics of the clustering.

Including the number/fraction of cells in each cluster, the gene expression values and dimension reduction plots. It's also possible to perform stats on TCR clones/clusters or other metadata for each T-cell cluster.

Input

  • srtobj: The seurat object loaded by SeuratClustering

Output

  • outdir: Default: {{in.srtobj | stem}}.cluster_stats.
    The output directory.
    Different types of plots will be saved in different subdirectories.
    For example, clustree plots will be saved in clustrees subdirectory.
    For each case in envs.clustrees, both the png and pdf files will be saved.

Environment Variables

  • mutaters (type=json): Default: {}.
    The mutaters to mutate the metadata to subset the cells.
    The mutaters will be applied in the order specified.
    You can also use the clone selectors to select the TCR clones/clusters.
    See https://pwwang.github.io/scplotter/reference/clone_selectors.html.
  • cache (type=auto): Default: /tmp.
    Whether to cache the plots.
    Currently only plots for features are supported, since creating the those plots can be time consuming.
    If True, the plots will be cached in the job output directory, which will be not cleaned up when job is rerunning.
  • clustrees_defaults (ns): The parameters for the clustree plots.
    • devpars (ns): The device parameters for the clustree plot.
      • res (type=int): Default: 100.
        The resolution of the plots.
      • height (type=int): The height of the plots.
      • width (type=int): The width of the plots.
    • more_formats (type=list): Default: [].
      The formats to save the plots other than png.
    • save_code (flag): Default: False.
      Whether to save the code to reproduce the plot.
    • prefix (type=auto): Default: True.
      string indicating columns containing clustering information.
      The trailing dot is not necessary and will be added automatically.
      When TRUE, clustrees will be plotted when there is FindClusters or FindClusters.* in the obj@commands.
      The latter is generated by SeuratSubClustering.
      This will be ignored when envs.clustrees is specified (the prefix of each case must be specified separately).
    • <more>: Other arguments passed to scplotter::ClustreePlot.
      See https://pwwang.github.io/scplotter/reference/ClustreePlot.html
  • clustrees (type=json): Default: {}.
    The cases for clustree plots.
    Keys are the names of the plots and values are the dicts inherited from env.clustrees_defaults except prefix.
    There is no default case for clustrees.
  • stats_defaults (ns): The default parameters for stats.
    This is to do some basic statistics on the clusters/cells. For more comprehensive analysis, see https://pwwang.github.io/scplotter/reference/CellStatPlot.html.
    The parameters from the cases can overwrite the default parameters.
    • subset: An expression to subset the cells, will be passed to tidyrseurat::filter().
    • devpars (ns): The device parameters for the clustree plot.
      • res (type=int): Default: 100.
        The resolution of the plots.
      • height (type=int): The height of the plots.
      • width (type=int): The width of the plots.
    • descr: The description of the plot, showing in the report.
    • more_formats (type=list): Default: [].
      The formats to save the plots other than png.
    • save_code (flag): Default: False.
      Whether to save the code to reproduce the plot.
    • save_data (flag): Default: False.
      Whether to save the data used to generate the plot.
    • <more>: Other arguments passed to scplotter::CellStatPlot.
      See https://pwwang.github.io/scplotter/reference/CellStatPlot.html.
  • stats (type=json): Default: {'Number of cells in each cluster (Bar Chart)': Diot({'plot_type': 'bar', 'x_text_angle': 90}), 'Number of cells in each cluster by Sample (Bar Chart)': Diot({'plot_type': 'bar', 'group_by': 'Sample', 'x_text_angle': 90})}.
    The number/fraction of cells to plot.
    Keys are the names of the plots and values are the dicts inherited from env.stats_defaults.
    Here are some examples -

    {
        "nCells_All": {},
        "nCells_Sample": {"group_by": "Sample"},
        "fracCells_Sample": {"scale_y": True, "group_by": "Sample", plot_type = "pie"},
    }
    
  • ngenes_defaults (ns): The default parameters for ngenes.
    The default parameters to plot the number of genes expressed in each cell.

    • more_formats (type=list): Default: [].
      The formats to save the plots other than png.
    • subset: An expression to subset the cells, will be passed to tidyrseurat::filter().
    • devpars (ns): The device parameters for the plots.
      • res (type=int): Default: 100.
        The resolution of the plots.
      • height (type=int): Default: 800.
        The height of the plots.
      • width (type=int): Default: 1000.
        The width of the plots.
  • ngenes (type=json): Default: {'Number of genes expressed in each cluster': Diot({})}.
    The number of genes expressed in each cell.
    Keys are the names of the plots and values are the dicts inherited from env.ngenes_defaults.
  • features_defaults (ns): The default parameters for features.
    • features (type=auto): The features to plot.
      It can be either a string with comma separated features, a list of features, a file path with file:// prefix with features (one per line), or an integer to use the top N features from VariantFeatures(srtobj).
      It can also be a dict with the keys as the feature group names and the values as the features, which is used for heatmap to group the features.
    • order_by (type=auto): The order of the clusters to show on the plot.
      An expression passed to dplyr::arrange() on the grouped meta data frame (by ident).
      For example, you can order the clusters by the activation score of the cluster: desc(mean(ActivationScore, na.rm = TRUE)), suppose you have a column ActivationScore in the metadata.
      You may also specify the literal order of the clusters by a list of strings (at least two).
    • subset: An expression to subset the cells, will be passed to tidyrseurat::filter().
    • devpars (ns): The device parameters for the plots. Does not work for table.
      • res (type=int): Default: 100.
        The resolution of the plots.
      • height (type=int): The height of the plots.
      • width (type=int): The width of the plots.
    • descr: The description of the plot, showing in the report.
    • more_formats (type=list): Default: [].
      The formats to save the plots other than png.
    • save_code (flag): Default: False.
      Whether to save the code to reproduce the plot.
    • save_data (flag): Default: False.
      Whether to save the data used to generate the plot.
    • <more>: Other arguments passed to scplotter::FeatureStatPlot.
      See https://pwwang.github.io/scplotter/reference/FeatureStatPlot.html
  • features (type=json): Default: {}.
    The plots for features, include gene expressions, and columns from metadata.
    Keys are the titles of the cases and values are the dicts inherited from env.features_defaults.
  • dimplots_defaults (ns): The default parameters for dimplots.
    • group_by: Default: seurat_clusters.
      The identity to use.
      If it is from subclustering (reduction sub_umap_<ident> exists), this reduction will be used if reduction is set to dim or auto.
    • split_by: The column name in metadata to split the cells into different plots.
    • subset: An expression to subset the cells, will be passed to tidyrseurat::filter().
    • devpars (ns): The device parameters for the plots.
      • res (type=int): Default: 100.
        The resolution of the plots.
      • height (type=int): The height of the plots.
      • width (type=int): The width of the plots.
    • reduction (choice): Default: dim.
      Which dimensionality reduction to use.
      • dim: Use Seurat::DimPlot.
        First searches for umap, then tsne, then pca.
        If ident is from subclustering, sub_umap_<ident> will be used.
      • auto: Same as dim
      • umap: Use Seurat::UMAPPlot.
      • tsne: Use Seurat::TSNEPlot.
      • pca: Use Seurat::PCAPlot.
    • <more>: See https://pwwang.github.io/scplotter/reference/CellDimPlot.html
  • dimplots (type=json): Default: {'Dimensional reduction plot': Diot({'label': True}), 'VDJ Presence': Diot({'group_by': 'VDJ_Presence'})}.
    The dimensional reduction plots.
    Keys are the titles of the plots and values are the dicts inherited from env.dimplots_defaults. It can also have other parameters from scplotter::CellDimPlot.

Examples

Number of cells in each cluster

[SeuratClusterStats.envs.stats]
# suppose you have nothing set in `envs.stats_defaults`
# otherwise, the settings will be inherited here
nCells_All = { }

nCells_All

Number of cells in each cluster by groups

[SeuratClusterStats.envs.stats]
nCells_Sample = { group_by = "Sample" }

nCells_Sample

Violin plots for the gene expressions

[SeuratClusterStats.envs.features]
features = "CD4,CD8A"
# Remove the dots in the violin plots
vlnplots = { pt-size = 0, kind = "vln" }
# Don't use the default genes
vlnplots_1 = { features = ["FOXP3", "IL2RA"], pt-size = 0, kind = "vln" }

vlnplots vlnplots_1

Dimension reduction plot with labels

[SeuratClusterStats.envs.dimplots.Idents]
label = true

dimplots