PseudoBulkDEG

Pseduo-bulk differential gene expression analysis

This process performs differential gene expression analysis, instead of on single-cell level, on the pseudo-bulk data, aggregated from the single-cell data.

Input

  • sobjfile: The seurat object file in RDS or qs/qs2 format.

Output

  • outdir: Default: {{in.sobjfile | stem}}.pseudobulk_deg.
    The output containing the results of the differential gene expression analysis.

Environment Variables

  • mutaters (type=json): Default: {}.
    Mutaters to mutate the metadata of the seurat object. Keys are the new column names and values are the expressions to mutate the columns. These new columns can be used to define your cases.
    You can also use the clone selectors to select the TCR clones/clusters.
    See https://pwwang.github.io/scplotter/reference/clone_selectors.html.
  • each: The column name in metadata to separate the cells into different cases.
    When specified, the case will be expanded to multiple cases for each value in the column.
  • subset: An expression in string to subset the cells.
  • aggregate_by: The column names in metadata to aggregate the cells.
  • layer: Default: counts.
    The layer to pull and aggregate the data.
  • assay: Default: RNA.
    The assay to pull and aggregate the data.
  • error (flag): Default: False.
    Error out if no/not enough markers are found or no pathways are enriched.
    If False, empty results will be returned.
  • group_by: The column name in metadata to group the cells.
  • ident_1: The first identity to compare.
  • ident_2: The second identity to compare.
    If not specified, the rest of the identities will be compared with ident_1.
  • paired_by: The column name in metadata to mark the paired samples.
    For example, subject. If specified, the paired test will be performed.
  • dbs (list): Default: ['KEGG_2021_Human', 'MSigDB_Hallmark_2020'].
    The databases to use for enrichment analysis.
    The databases are passed to biopipen.utils::Enrichr() to do the enrichment analysis. The default databases are KEGG_2021_Human and MSigDB_Hallmark_2020.
    See https://maayanlab.cloud/Enrichr/#libraries for the available libraries.
  • sigmarkers: Default: p_val_adj < 0.05.
    An expression passed to dplyr::filter() to filter the significant markers for enrichment analysis.
    The default is p_val_adj < 0.05.
    If tool = 'DESeq2', the variables that can be used for filtering are: baseMean, log2FC, lfcSE, stat, p_val, p_val_adj.
    If tool = 'edgeR', the variables that can be used for filtering are: logCPM, log2FC, LR, p_val, p_val_adj.
  • enrich_style (choice): Default: enrichr.
    The style of the enrichment analysis.
    • enrichr: Use enrichr-style for the enrichment analysis.
    • clusterProfiler: Use clusterProfiler-style for the enrichment analysis.
  • allmarker_plots_defaults (ns): Default options for the plots for all markers when ident-1 is not specified.
    • plot_type: The type of the plot.
      See https://pwwang.github.io/scplotter/reference/FeatureStatPlot.html.
      Available types are violin, box, bar, ridge, dim, heatmap and dot.
    • more_formats (type=list): Default: [].
      The extra formats to save the plot in.
    • save_code (flag): Default: False.
      Whether to save the code to generate the plot.
    • devpars (ns): The device parameters for the plots.
      • res (type=int): Default: 100.
        The resolution of the plots.
      • height (type=int): The height of the plots.
      • width (type=int): The width of the plots.
    • order_by: Default: desc(abs(log2FC)).
      an expression to order the markers, passed by dplyr::arrange().
    • genes: Default: 10.
      The number of top genes to show or an expression passed to dplyr::filter() to filter the genes.
    • <more>: Other arguments passed to scplotter::FeatureStatPlot().
  • allmarker_plots (type=json): Default: {}.
    All marker plot cases.
    The keys are the names of the cases and the values are the dicts inherited from allmarker_plots_defaults.
  • allenrich_plots_defaults (ns): Default options for the plots to generate for the enrichment analysis.
    • plot_type: Default: heatmap.
      The type of the plot.
    • devpars (ns): The device parameters for the plots.
      • res (type=int): Default: 100.
        The resolution of the plots.
      • height (type=int): The height of the plots.
      • width (type=int): The width of the plots.
    • <more>: See https://pwwang.github.io/scplotter/reference/EnrichmentPlot.html.
  • allenrich_plots (type=json): Default: {}.
    Cases of the plots to generate for the enrichment analysis.
    The keys are the names of the cases and the values are the dicts inherited from allenrich_plots_defaults.
    The cases under envs.cases can inherit this options.
  • marker_plots_defaults (ns): Default options for the plots to generate for the markers.
    • plot_type: The type of the plot.
      See https://pwwang.github.io/scplotter/reference/FeatureStatPlot.html.
      Available types are violin, box, bar, ridge, dim, heatmap and dot.
      There are two additional types available - volcano_pct and volcano_log2fc.
    • more_formats (type=list): Default: [].
      The extra formats to save the plot in.
    • save_code (flag): Default: False.
      Whether to save the code to generate the plot.
    • devpars (ns): The device parameters for the plots.
      • res (type=int): Default: 100.
        The resolution of the plots.
      • height (type=int): The height of the plots.
      • width (type=int): The width of the plots.
    • order_by: Default: desc(abs(log2FC)).
      an expression to order the markers, passed by dplyr::arrange().
    • genes: Default: 10.
      The number of top genes to show or an expression passed to dplyr::filter() to filter the genes.
    • <more>: Other arguments passed to scplotter::FeatureStatPlot().
      If plot_type is volcano_pct or volcano_log2fc, they will be passed to scplotter::VolcanoPlot().
  • marker_plots (type=json): Default: {'Volcano Plot': Diot({'plot_type': 'volcano'})}.
    Cases of the plots to generate for the markers.
    Plot cases. The keys are the names of the cases and the values are the dicts inherited from marker_plots_defaults.
    The cases under envs.cases can inherit this options.
  • enrich_plots_defaults (ns): Default options for the plots to generate for the enrichment analysis.
  • enrich_plots (type=json): Default: {'Bar Plot': Diot({'plot_type': 'bar', 'ncol': 1, 'top_term': 10})}.
    Cases of the plots to generate for the enrichment analysis.
    The keys are the names of the cases and the values are the dicts inherited from enrich_plots_defaults.
    The cases under envs.cases can inherit this options.
  • overlaps_defaults (ns): Default options for investigating the overlapping of significant markers between different cases or comparisons.
    This means either ident-1 should be empty, so that they can be expanded to multiple comparisons.
    • sigmarkers: The expression to filter the significant markers for each case.
      If not provided, envs.sigmarkers will be used.
    • plot_type (choice): Default: venn.
      The type of the plot to generate for the overlaps.
      • venn: Use plotthis::VennDiagram().
      • upset: Use plotthis::UpsetPlot().
    • more_formats (type=list): Default: [].
      The extra formats to save the plot in.
    • save_code (flag): Default: False.
      Whether to save the code to generate the plot.
    • devpars (ns): The device parameters for the plots.
      • res (type=int): Default: 100.
        The resolution of the plots.
      • height (type=int): The height of the plots.
      • width (type=int): The width of the plots.
    • <more>: More arguments pased to plotthis::VennDiagram() (https://pwwang.github.io/plotthis/reference/venndiagram1.html) or plotthis::UpsetPlot() (https://pwwang.github.io/plotthis/reference/upsetplot1.html)
  • overlaps (type=json): Default: {}.
    Cases for investigating the overlapping of significant markers between different cases or comparisons.
    The keys are the names of the cases and the values are the dicts inherited from overlaps_defaults.
    There are two situations that we can perform overlaps:
    1. If ident-1 is not specified, the overlaps can be performed between different comparisons.
    2. If each is specified, the overlaps can be performed between different cases, where in each case, ident-1 must be specified.
  • tool (choice): Default: DESeq2.
    The method to use for the differential expression analysis.
    • DESeq2: Use DESeq2 for the analysis.
    • edgeR: Use edgeR for the analysis.
  • plots_defaults (ns): The default parameters for the plots.
  • plots (type=json): The parameters for the plots.
    The keys are the names of the plots and the values are the parameters for the plots. The parameters will override the defaults in plots_defaults.
    If not specified, no plots will be generated.
  • cases (type=json): Default: {}.
    The cases for the analysis.
    The keys are the names of the cases and the values are the arguments for the analysis. The arguments include the ones inherited from envs.
    If no cases are specified, a default case will be added with the name DEG Analysis and the default values specified above.