biopipen.ns.scrna

module

biopipen.ns.scrna

</>

Tools to analyze single-cell RNA

Classes

SeuratLoading (Proc) — Seurat - Loading data</>
SeuratPreparing (Proc) — Load, prepare and apply QC to data, using Seurat</>
SeuratClustering (Proc) — Determine the clusters of cells without reference using Seurat FindClustersprocedure. </>
SeuratSubClustering (Proc) — Find clusters of a subset of cells.</>
SeuratClusterStats (Proc) — Statistics of the clustering.</>
ModuleScoreCalculator (Proc) — Calculate the module scores for each cell</>
CellsDistribution (Proc) — Distribution of cells (i.e. in a TCR clone) from different groupsfor each cluster </>
SeuratMetadataMutater (Proc) — Mutate the metadata of the seurat object</>
DimPlots (Proc) — Seurat - Dimensional reduction plots</>
MarkersFinder (Proc) — Find markers between different groups of cells</>
TopExpressingGenes (Proc) — Find the top expressing genes in each cluster</>
ExprImputation (Proc) — This process imputes the dropout values in scRNA-seq data.</>
SCImpute (Proc) — Impute the dropout values in scRNA-seq data.</>
SeuratFilter (Proc) — Filtering cells from a seurat object</>
SeuratSubset (Proc) — Subset a seurat object into multiple seruat objects</>
SeuratSplit (Proc) — Split a seurat object into multiple seruat objects</>
Subset10X (Proc) — Subset 10X data, mostly used for testing</>
SeuratTo10X (Proc) — Write a Seurat object to 10X format</>
ScFGSEA (Proc) — Gene set enrichment analysis for cells in different groups using fgsea</>
CellTypeAnnotation (Proc) — Annotate the cell clusters. Currently, four ways are supported:</>
SeuratMap2Ref (Proc) — Map the seurat object to reference</>
RadarPlots (Proc) — Radar plots for cell proportion in different clusters.</>
MetaMarkers (Proc) — Find markers between three or more groups of cells, using one-way ANOVAor Kruskal-Wallis test. </>
Seurat2AnnData (Proc) — Convert seurat object to AnnData</>
AnnData2Seurat (Proc) — Convert AnnData to seurat object</>
ScSimulation (Proc) — Simulate single-cell data using splatter.</>
CellCellCommunication (Proc) — Cell-cell communication inference</>
CellCellCommunicationPlots (Proc) — Visualization for cell-cell communication inference.</>
ScVelo (Proc) — Velocity analysis for single-cell RNA-seq data</>
Slingshot (Proc) — Trajectory inference using Slingshot</>
LoomTo10X (Proc) — Convert Loom file to 10X format</>

class

`biopipen.ns.scrna.SeuratLoading(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Seurat - Loading data

Deprecated, should be superseded by SeuratPreparing

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

metafile —
The metadata of the samplesA tab-delimited file Two columns are required:
- - Sample to specify the sample names.
- - RNAData to assign the path of the data to the samples
  The path will be read by Read10X() from Seurat

Output

rdsfile — The RDS file with a list of Seurat object

Envs

qc — The QC filter for each sample.This will be passed to subset(obj, subset=<qc>). For example nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.SeuratPreparing(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Load, prepare and apply QC to data, using Seurat

This process will -

- Prepare the seurat object
- Apply QC to the data
- Integrate the data from different samples

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.SeuratClustering(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Determine the clusters of cells without reference using Seurat FindClustersprocedure.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

srtobj — The seurat object loaded by SeuratPreparing

Output

outfile — The seurat object with cluster information at seurat_clusters.

Envs

FindClusters (ns) —
Arguments for FindClusters().object is specified internally, and - in the key will be replaced with .. The cluster labels will be saved in seurat_clusters and prefixed with "c". The first cluster will be "c1", instead of "c0".
- - resolution (type=auto): The resolution of the clustering. You can have multiple resolutions as a list or as a string separated by comma.
  Ranges are also supported, for example: 0.1:0.5:0.1 will generate 0.1, 0.2, 0.3, 0.4, 0.5. The step can be omitted, defaulting to 0.1.
  The results will be saved in seurat_clusters_<resolution>.
  The final resolution will be used to define the clusters at seurat_clusters.
- - : See https://satijalab.org/seurat/reference/findclusters
FindNeighbors (ns) —
Arguments for FindNeighbors().object is specified internally, and - in the key will be replaced with ..
- - reduction: The reduction to use.
  If not provided, sobj@misc$integrated_new_reduction will be used.
- - : See https://satijalab.org/seurat/reference/findneighbors
RunPCA (ns) — Arguments for RunPCA().
RunUMAP (ns) —
Arguments for RunUMAP().object is specified internally, and - in the key will be replaced with .. dims=N will be expanded to dims=1:N; The maximal value of N will be the minimum of N and the number of columns - 1 for each sample.
- - dims (type=int): The number of PCs to use
- - reduction: The reduction to use for UMAP.
  If not provided, sobj@misc$integrated_new_reduction will be used.
- - : See https://satijalab.org/seurat/reference/runumap
cache (type=auto) — Whether to cache the information at different steps.If True, the seurat object will be cached in the job output directory, which will be not cleaned up when job is rerunning. The cached seurat object will be saved as <signature>.<kind>.RDS file, where <signature> is the signature determined by the input and envs of the process. See https://github.com/satijalab/seurat/issues/7849, https://github.com/satijalab/seurat/issues/5358 and https://github.com/satijalab/seurat/issues/6748 for more details also about reproducibility issues. To not use the cached seurat object, you can either set cache to False or delete the cached file at <signature>.RDS in the cache directory.
ncores (type=int;order=-100) — Number of cores to use.Used in future::plan(strategy = "multicore", workers = <ncores>) to parallelize some Seurat procedures. See also: https://satijalab.org/seurat/articles/future_vignette.html

Requires

r-dplyr —
- check: {{proc.lang}} <(echo "library(dplyr)")
r-seurat —
- check: {{proc.lang}} <(echo "library(Seurat)")
r-tidyr —
- check: {{proc.lang}} <(echo "library(tidyr)")

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.SeuratSubClustering(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Find clusters of a subset of cells.

It's unlike [Seurat::FindSubCluster], which only finds subclusters of a single cluster. Instead, it will perform the whole clustering procedure on the subset of cells. One can use metadata to specify the subset of cells to perform clustering on.

For the subset of cells, the reductions will be re-performed on the subset of cells, and then the clustering will be performed on the subset of cells. The reduction will be saved in sobj@reduction$sub_umap_<casename> of the original object and the clustering will be saved in the metadata of the original object using the casename as the column name.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

srtobj — The seurat object

Output

rdsfile — The seurat object with the subclustering information.

Envs

FindClusters (ns) —
Arguments for FindClusters().object is specified internally, and - in the key will be replaced with .. The cluster labels will be prefixed with "s". The first cluster will be "s1", instead of "s0".
- - resolution (type=auto): The resolution of the clustering. You can have multiple resolutions as a list or as a string separated by comma.
  Ranges are also supported, for example: 0.1:0.5:0.1 will generate 0.1, 0.2, 0.3, 0.4, 0.5. The step can be omitted, defaulting to 0.1.
  The results will be saved in <casename>_<resolution>.
  The final resolution will be used to define the clusters at <casename>.
- - : See https://satijalab.org/seurat/reference/findclusters
FindNeighbors (ns) —
Arguments for FindNeighbors().object is specified internally, and - in the key will be replaced with ..
- - reduction: The reduction to use.
  If not provided, sobj@misc$integrated_new_reduction will be used.
- - : See https://satijalab.org/seurat/reference/findneighbors
RunUMAP (ns) —
Arguments for RunUMAP().object is specified internally as the subset object, and - in the key will be replaced with .. dims=N will be expanded to dims=1:N; The maximal value of N will be the minimum of N and the number of columns - 1 for each sample.
- - dims (type=int): The number of PCs to use
- - reduction: The reduction to use for UMAP.
  If not provided, sobj@misc$integrated_new_reduction will be used.
- - : See https://satijalab.org/seurat/reference/runumap
cache (type=auto) — Whether to cache the information at different steps.If True, the seurat object will be cached in the job output directory, which will be not cleaned up when job is rerunning. The cached seurat object will be saved as <signature>.<kind>.RDS file, where <signature> is the signature determined by the input and envs of the process. See https://github.com/satijalab/seurat/issues/7849, https://github.com/satijalab/seurat/issues/5358 and https://github.com/satijalab/seurat/issues/6748 for more details also about reproducibility issues. To not use the cached seurat object, you can either set cache to False or delete the cached file at <signature>.RDS in the cache directory.
cases (type=json) — The cases to perform subclustering.Keys are the names of the cases and values are the dicts inherited from envs except mutaters and cache. If empty, a case with name subcluster will be created with default parameters.
mutaters (type=json) — The mutaters to mutate the metadata to subset the cells.The mutaters will be applied in the order specified.
ncores (type=int;order=-100) — Number of cores to use.Used in future::plan(strategy = "multicore", workers = <ncores>) to parallelize some Seurat procedures.
subset — An expression to subset the cells, will be passed totidyseurat::filter().

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.SeuratClusterStats(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Statistics of the clustering.

Including the number/fraction of cells in each cluster, the gene expression values and dimension reduction plots. It's also possible to perform stats on TCR clones/clusters or other metadata for each T-cell cluster.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Examples

Number of cells in each cluster

[SeuratClusterStats.envs.stats]
# suppose you have nothing set in `envs.stats_defaults`
# otherwise, the settings will be inherited here
nCells_All = { }

nCells_All {: width="80%" }

Number of cells in each cluster by groups

[SeuratClusterStats.envs.stats]
nCells_Sample = { group-by = "Sample" }

nCells_Sample {: width="80%" }

Violin plots for the gene expressions

[SeuratClusterStats.envs.features]
features = "CD4,CD8A"
# Remove the dots in the violin plots
vlnplots = { pt-size = 0, kind = "vln" }
# Don't use the default genes
vlnplots_1 = { features = ["FOXP3", "IL2RA"], pt-size = 0, kind = "vln" }

vlnplots {: width="80%" } vlnplots_1 {: width="80%" }

Dimension reduction plot with labels

[SeuratClusterStats.envs.dimplots.Idents]
label = true
label-box = true
repel = true

dimplots {: width="80%" }

Input

srtobj — The seurat object loaded by SeuratClustering

Output

outdir — The output directory.Different types of plots will be saved in different subdirectories. For example, clustree plots will be saved in clustrees subdirectory. For each case in envs.clustrees, both the png and pdf files will be saved.

Envs

cache (type=auto) — Whether to cache the plots.Currently only plots for features are supported, since creating the those plots can be time consuming. If True, the plots will be cached in the job output directory, which will be not cleaned up when job is rerunning.
clustrees (type=json) — The cases for clustree plots.Keys are the names of the plots and values are the dicts inherited from env.clustrees_defaults except prefix. There is no default case for clustrees.
clustrees_defaults (ns) —
The parameters for the clustree plots.
- - devpars (ns): The device parameters for the clustree plot.
  - res (type=int): The resolution of the plots.
  - height (type=int): The height of the plots.
  - width (type=int): The width of the plots.
- - more_formats (list): The formats to save the plots other than png.
- - save_code (flag): Whether to save the code to reproduce the plot.
- - prefix (type=auto): string indicating columns containing clustering information.
  The trailing dot is not necessary and will be added automatically.
  When TRUE, clustrees will be plotted when there is FindClusters or
  FindClusters.* in the obj@commands.
  The latter is generated by SeuratSubClustering.
  This will be ignored when envs.clustrees is specified
  (the prefix of each case must be specified separately).
- - : Other arguments passed to scplotter::ClustreePlot.
  See https://pwwang.github.io/scplotter/reference/ClustreePlot.html
dimplots (type=json) — The dimensional reduction plots.Keys are the titles of the plots and values are the dicts inherited from env.dimplots_defaults. It can also have other parameters from scplotter::CellDimPlot.
dimplots_defaults (ns) —
The default parameters for dimplots.
- - group_by: The identity to use.
  If it is from subclustering (reduction sub_umap_<ident> exists), this reduction will be used if reduction
  is set to dim or auto.
- - split_by: The column name in metadata to split the cells into different plots.
- - subset: An expression to subset the cells, will be passed to tidyrseurat::filter().
- - devpars (ns): The device parameters for the plots.
  - res (type=int): The resolution of the plots.
  - height (type=int): The height of the plots.
  - width (type=int): The width of the plots.
- - reduction (choice): Which dimensionality reduction to use.
  - dim: Use Seurat::DimPlot.
  First searches for umap, then tsne, then pca.
  If ident is from subclustering, sub_umap_<ident> will be used.
  - auto: Same as dim
  - umap: Use Seurat::UMAPPlot.
  - tsne: Use Seurat::TSNEPlot.
  - pca: Use Seurat::PCAPlot.
- - : See https://pwwang.github.io/scplotter/reference/CellDimPlot.html
features (type=json) — The plots for features, include gene expressions, and columns from metadata.Keys are the titles of the cases and values are the dicts inherited from env.features_defaults.
features_defaults (ns) —
The default parameters for features.
- - features: The features to plot.
  It can be either a string with comma separated features, a list of features, a file path with file:// prefix with features
  (one per line), or an integer to use the top N features from VariantFeatures(srtobj).
- - order_by (type=auto): The order of the clusters to show on the plot.
  An expression passed to dplyr::arrange() on the grouped meta data frame (by ident).
  For example, you can order the clusters by the activation score of
  the cluster: desc(mean(ActivationScore, na.rm = TRUE)), suppose you have a column
  ActivationScore in the metadata.
  You may also specify the literal order of the clusters by a list of strings (at least two).
- - subset: An expression to subset the cells, will be passed to tidyrseurat::filter().
- - devpars (ns): The device parameters for the plots. Does not work for table.
  - res (type=int): The resolution of the plots.
  - height (type=int): The height of the plots.
  - width (type=int): The width of the plots.
- - descr: The description of the plot, showing in the report.
- - more_formats (list): The formats to save the plots other than png.
- - save_code (flag): Whether to save the code to reproduce the plot.
- - save_data (flag): Whether to save the data used to generate the plot.
- - : Other arguments passed to scplotter::FeatureStatPlot.
  See https://pwwang.github.io/scplotter/reference/FeatureStatPlot.html
mutaters (type=json) — The mutaters to mutate the metadata to subset the cells.The mutaters will be applied in the order specified.
ngenes (type=json) — The number of genes expressed in each cell.Keys are the names of the plots and values are the dicts inherited from env.ngenes_defaults.
ngenes_defaults (ns) —
The default parameters for ngenes.The default parameters to plot the number of genes expressed in each cell.
- - more_formats (list): The formats to save the plots other than png.
- - subset: An expression to subset the cells, will be passed to tidyrseurat::filter().
- - devpars (ns): The device parameters for the plots.
  - res (type=int): The resolution of the plots.
  - height (type=int): The height of the plots.
  - width (type=int): The width of the plots.
stats (type=json) — The number/fraction of cells to plot.Keys are the names of the plots and values are the dicts inherited from env.stats_defaults. Here are some examples -

{ "nCells_All": {}, "nCells_Sample": {"group_by": "Sample"}, "fracCells_Sample": {"scale_y": True, "group_by": "Sample", plot_type = "pie"}, }
stats_defaults (ns) —
The default parameters for stats.This is to do some basic statistics on the clusters/cells. For more comprehensive analysis, see https://pwwang.github.io/scplotter/reference/CellStatPlot.html. The parameters from the cases can overwrite the default parameters.
- - subset: An expression to subset the cells, will be passed to tidyrseurat::filter().
- - devpars (ns): The device parameters for the clustree plot.
  - res (type=int): The resolution of the plots.
  - height (type=int): The height of the plots.
  - width (type=int): The width of the plots.
- - more_formats (list): The formats to save the plots other than png.
- - save_code (flag): Whether to save the code to reproduce the plot.
- - save_data (flag): Whether to save the data used to generate the plot.
- - : Other arguments passed to scplotter::CellStatPlot.
  See https://pwwang.github.io/scplotter/reference/CellStatPlot.html.

Requires

r-seurat —
- check: {{proc.lang}} -e "library(Seurat)"

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.ModuleScoreCalculator(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Calculate the module scores for each cell

The module scores are calculated by Seurat::AddModuleScore() or Seurat::CellCycleScoring() for cell cycle scores.

The module scores are calculated as the average expression levels of each program on single cell level, subtracted by the aggregated expression of control feature sets. All analyzed features are binned based on averaged expression, and the control features are randomly selected from each bin.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

srtobj — The seurat object loaded by SeuratClustering

Output

rdsfile — The seurat object with module scores added to the metadata.

Envs

defaults (ns) —
The default parameters for modules.
- - features: The features to calculate the scores. Multiple features
  should be separated by comma.
  You can also specify cc.genes or cc.genes.updated.2019 to
  use the cell cycle genes to calculate cell cycle scores.
  If so, three columns will be added to the metadata, including
  S.Score, G2M.Score and Phase.
  Only one type of cell cycle scores can be calculated at a time.
- - nbin (type=int): Number of bins of aggregate expression levels
  for all analyzed features.
- - ctrl (type=int): Number of control features selected from
  the same bin per analyzed feature.
- - k (flag): Use feature clusters returned from DoKMeans.
- - assay: The assay to use.
- - seed (type=int): Set a random seed.
- - search (flag): Search for symbol synonyms for features in
  features that don't match features in object?
- - keep (flag): Keep the scores for each feature?
  Only works for non-cell cycle scores.
- - agg (choice): The aggregation function to use.
  Only works for non-cell cycle scores.
  - mean: The mean of the expression levels
  - median: The median of the expression levels
  - sum: The sum of the expression levels
  - max: The max of the expression levels
  - min: The min of the expression levels
  - var: The variance of the expression levels
  - sd: The standard deviation of the expression levels
modules (type=json) — The modules to calculate the scores.Keys are the names of the expression programs and values are the dicts inherited from env.defaults. Here are some examples -

{ "CellCycle": {"features": "cc.genes.updated.2019"}, "Exhaustion": {"features": "HAVCR2,ENTPD1,LAYN,LAG3"}, "Activation": {"features": "IFNG"}, "Proliferation": {"features": "STMN1,TUBB"} }

For CellCycle, the columns S.Score, G2M.Score and Phase will be added to the metadata. S.Score and G2M.Score are the cell cycle scores for each cell, and Phase is the cell cycle phase for each cell.
You can also add Diffusion Components (DC) to the modules

{"DC": {"features": 2, "kind": "diffmap"}} will perform diffusion map as a reduction and add the first 2 components as DC_1 and DC_2 to the metadata. diffmap is a shortcut for diffusion_map. Other key-value pairs will pass to destiny::DiffusionMap(). You can later plot the diffusion map by using reduction = "DC" in env.dimplots in SeuratClusterStats. This requires SingleCellExperiment and destiny R packages.

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.CellsDistribution(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Distribution of cells (i.e. in a TCR clone) from different groupsfor each cluster

This generates a set of pie charts with proportion of cells in each cluster Rows are the cells identities (i.e. TCR clones or TCR clusters), columns are groups (i.e. clinic groups).

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Examples

[CellsDistribution.envs.mutaters]# Add Patient1_Tumor_Expanded column with CDR3.aa that
# expands in Tumor of patient 1
Patient1_Tumor_Expanded = '''
  expanded(., region, "Tumor", subset = patient == "Lung1", uniq = FALSE)
'''

[CellsDistribution.envs.cases.Patient1_Tumor_Expanded]
cells_by = "Patient1_Tumor_Expanded"
cells_orderby = "desc(CloneSize)"
group_by = "region"
group_order = [ "Tumor", "Normal" ]

CellsDistribution_example

Input

srtobj — The seurat object in RDS format

Output

outdir — The output directory.The results for each case will be saved in a subdirectory.

Envs

cases (type=json;order=99) — If you have multiple cases, you can specify them here.Keys are the names of the cases and values are the options above except mutaters. If some options are not specified, the options in envs will be used. If no cases are specified, a default case will be used with case name DEFAULT.
cells_by — The column name in metadata to group the cells for the rows of the plot.If your cell groups have overlapping cells, you can also use multiple columns, separated by comma (,). These columns will be concatenated to form the cell groups. For the overlapping cells, they will be counted multiple times for different groups. So make sure the cell group names in different columns are unique.
cells_n (type=int) — The max number of groups to show for each cell group identity (row).Ignored if cells_order is specified.
cells_order (list) — The order of the cells (rows) to show on the plot
cells_orderby —
An expression passed to dplyr::arrange() to order the cells (rows) of the plot.Only works when cells-order is not specified. The data frame passed to dplyr::arrange() is grouped by cells_by before ordering. You can have multiple expressions separated by semicolon (;). The expessions will be parsed by rlang::parse_exprs(). 4 extra columns were added to the metadata for ordering the rows in the plot:
- * CloneSize: The size (number of cells) of clones (identified by cells_by)
- * CloneGroupSize: The clone size in each group (identified by group_by)
- * CloneClusterSize: The clone size in each cluster (identified by seurat_clusters)
- * CloneGroupClusterSize: The clone size in each group and cluster (identified by group_by and seurat_clusters)
cluster_orderby — The order of the clusters to show on the plot.An expression passed to dplyr::summarise() on the grouped data frame (by seurat_clusters). The summary stat will be passed to dplyr::arrange() to order the clusters. It's applied on the whole meta.data before grouping and subsetting. For example, you can order the clusters by the activation score of the cluster: desc(mean(ActivationScore, na.rm = TRUE)), suppose you have a column ActivationScore in the metadata.
descr — The description of the case, will be shown in the report.
devpars (ns) —
The device parameters for the plots of pie charts.
- - res (type=int): The resolution of the plots
- - height (type=int): The height of the plots
- - width (type=int): The width of the plots
each — The column name in metadata to separate the cells into different plots.
group_by — The column name in metadata to group the cells for the columns of the plot.
group_order (list) — The order of the groups (columns) to show on the plot
hm_devpars (ns) —
The device parameters for the heatmaps.
- - res (type=int): The resolution of the heatmaps.
- - height (type=int): The height of the heatmaps.
- - width (type=int): The width of the heatmaps.
mutaters (type=json) —
The mutaters to mutate the metadataKeys are the names of the mutaters and values are the R expressions passed by dplyr::mutate() to mutate the metadata. There are also also 4 helper functions, expanded, collapsed, emerged and vanished, which can be used to identify the expanded/collpased/emerged/vanished groups (i.e. TCR clones). See also https://pwwang.github.io/immunopipe/configurations/#mutater-helpers. For example, you can use {"Patient1_Tumor_Collapsed_Clones": "expanded(., Source, 'Tumor', subset = Patent == 'Patient1', uniq = FALSE)"} to create a new column in metadata named Patient1_Tumor_Collapsed_Clones with the collapsed clones in the tumor sample (compared to the normal sample) of patient 1. The values in this columns for other clones will be NA. Those functions take following arguments:
- * df: The metadata data frame. You can use the . to refer to it.
- * group.by: The column name in metadata to group the cells.
- * idents: The first group or both groups of cells to compare (value in group.by column). If only the first group is given, the rest of the cells (with non-NA in group.by column) will be used as the second group.
- * subset: An expression to subset the cells, will be passed to dplyr::filter(). Default is TRUE (no filtering).
- * each: A column name (without quotes) in metadata to split the cells.
  Each comparison will be done for each value in this column (typically each patient or subject).
- * id: The column name in metadata for the group ids (i.e. CDR3.aa).
- * compare: Either a (numeric) column name (i.e. Clones) in metadata to compare between groups, or .n to compare the number of cells in each group.
  If numeric column is given, the values should be the same for all cells in the same group.
  This will not be checked (only the first value is used).
  It is helpful to use Clones to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
  Also if you have subset set or NAs in group.by column, you should use .n to compare the number of cells in each group.
- * uniq: Whether to return unique ids or not. Default is TRUE. If FALSE, you can mutate the meta data frame with the returned ids. For example, df |> mutate(expanded = expanded(...)).
- * debug: Return the data frame with intermediate columns instead of the ids. Default is FALSE.
- * order: The expression passed to dplyr::arrange() to order intermediate dataframe and get the ids in order accordingly.
  The intermediate dataframe includes the following columns:
  * <id>: The ids of clones (i.e. CDR3.aa).
  * <each>: The values in each column.
  * ident_1: The size of clones in the first group.
  * ident_2: The size of clones in the second group.
  * .diff: The difference between the sizes of clones in the first and second groups.
  * .sum: The sum of the sizes of clones in the first and second groups.
  * .predicate: Showing whether the clone is expanded/collapsed/emerged/vanished.
- * include_emerged: Whether to include the emerged group for expanded (only works for expanded). Default is FALSE.
- * include_vanished: Whether to include the vanished group for collapsed (only works for collapsed). Default is FALSE.
You can also use top() to get the top clones (i.e. the clones with the largest size) in each group.For example, you can use {"Patient1_Top10_Clones": "top(subset = Patent == 'Patient1', uniq = FALSE)"} to create a new column in metadata named Patient1_Top10_Clones. The values in this columns for other clones will be NA. This function takes following arguments:
- * df: The metadata data frame. You can use the . to refer to it.
- * id: The column name in metadata for the group ids (i.e. CDR3.aa).
- * n: The number of top clones to return. Default is 10.
  If n < 1, it will be treated as the percentage of the size of the group.
  Specify 0 to get all clones.
- * compare: Either a (numeric) column name (i.e. Clones) in metadata to compare between groups, or .n to compare the number of cells in each group.
  If numeric column is given, the values should be the same for all cells in the same group.
  This will not be checked (only the first value is used).
  It is helpful to use Clones to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
  Also if you have subset set or NAs in group.by column, you should use .n to compare the number of cells in each group.
- * subset: An expression to subset the cells, will be passed to dplyr::filter(). Default is TRUE (no filtering).
- * each: A column name (without quotes) in metadata to split the cells.
  Each comparison will be done for each value in this column (typically each patient or subject).
- * uniq: Whether to return unique ids or not. Default is TRUE. If FALSE, you can mutate the meta data frame with the returned ids. For example, df |> mutate(expanded = expanded(...)).
- * debug: Return the data frame with intermediate columns instead of the ids. Default is FALSE.
- * with_ties: Whether to include ties (i.e. clones with the same size as the last clone) or not. Default is FALSE.
overlap (list) — Plot the overlap of cell groups (values of cells_by) in different casesunder the same section. The section must have at least 2 cases, each case should have a single cells_by column.
prefix_each (flag) — Whether to prefix the each column name to thevalue as the case/section name.
section — The section to show in the report. This allows different cases to be put in the same section in report.Only works when each is not specified. The section is used to collect cases and put the results under the same directory and the same section in report. When each for a case is specified, the section will be ignored and case name will be used as section. The cases will be the expanded values in each column. When prefix_each is True, the column name specified by each will be prefixed to each value as directory name and expanded case name.
subset — An expression to subset the cells, will be passed to dplyr::filter() on metadata.This will be applied prior to each.

Requires

r-dplyr —
- check: {{proc.lang}} -e "library(dplyr)"
r-seurat —
- check: {{proc.lang}} -e "library(Seurat)"
r-tidyr —
- check: {{proc.lang}} -e "library(tidyr)"

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.SeuratMetadataMutater(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Mutate the metadata of the seurat object

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

metafile — Additional metadataA tab-delimited file with columns as meta columns and rows as cells.
srtobj — The seurat object loaded by SeuratPreparing

Output

outfile — The seurat object with the additional metadata

Envs

mutaters (type=json) —
The mutaters to mutate the metadata.The key-value pairs will be passed the dplyr::mutate() to mutate the metadata. There are also also 4 helper functions, expanded, collapsed, emerged and vanished, which can be used to identify the expanded/collpased/emerged/vanished groups (i.e. TCR clones). See also https://pwwang.github.io/immunopipe/configurations/#mutater-helpers. For example, you can use {"Patient1_Tumor_Collapsed_Clones": "expanded(., Source, 'Tumor', subset = Patent == 'Patient1', uniq = FALSE)"} to create a new column in metadata named Patient1_Tumor_Collapsed_Clones with the collapsed clones in the tumor sample (compared to the normal sample) of patient 1. The values in this columns for other clones will be NA. Those functions take following arguments:
- * df: The metadata data frame. You can use the . to refer to it.
- * group.by: The column name in metadata to group the cells.
- * idents: The first group or both groups of cells to compare (value in group.by column). If only the first group is given, the rest of the cells (with non-NA in group.by column) will be used as the second group.
- * subset: An expression to subset the cells, will be passed to dplyr::filter(). Default is TRUE (no filtering).
- * each: A column name (without quotes) in metadata to split the cells.
  Each comparison will be done for each value in this column (typically each patient or subject).
- * id: The column name in metadata for the group ids (i.e. CDR3.aa).
- * compare: Either a (numeric) column name (i.e. Clones) in metadata to compare between groups, or .n to compare the number of cells in each group.
  If numeric column is given, the values should be the same for all cells in the same group.
  This will not be checked (only the first value is used).
  It is helpful to use Clones to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
  Also if you have subset set or NAs in group.by column, you should use .n to compare the number of cells in each group.
- * uniq: Whether to return unique ids or not. Default is TRUE. If FALSE, you can mutate the meta data frame with the returned ids. For example, df |> mutate(expanded = expanded(...)).
- * debug: Return the data frame with intermediate columns instead of the ids. Default is FALSE.
- * order: The expression passed to dplyr::arrange() to order intermediate dataframe and get the ids in order accordingly.
  The intermediate dataframe includes the following columns:
  * <id>: The ids of clones (i.e. CDR3.aa).
  * <each>: The values in each column.
  * ident_1: The size of clones in the first group.
  * ident_2: The size of clones in the second group.
  * .diff: The difference between the sizes of clones in the first and second groups.
  * .sum: The sum of the sizes of clones in the first and second groups.
  * .predicate: Showing whether the clone is expanded/collapsed/emerged/vanished.
- * include_emerged: Whether to include the emerged group for expanded (only works for expanded). Default is FALSE.
- * include_vanished: Whether to include the vanished group for collapsed (only works for collapsed). Default is FALSE.
You can also use top() to get the top clones (i.e. the clones with the largest size) in each group.For example, you can use {"Patient1_Top10_Clones": "top(subset = Patent == 'Patient1', uniq = FALSE)"} to create a new column in metadata named Patient1_Top10_Clones. The values in this columns for other clones will be NA. This function takes following arguments:
- * df: The metadata data frame. You can use the . to refer to it.
- * id: The column name in metadata for the group ids (i.e. CDR3.aa).
- * n: The number of top clones to return. Default is 10.
  If n < 1, it will be treated as the percentage of the size of the group.
  Specify 0 to get all clones.
- * compare: Either a (numeric) column name (i.e. Clones) in metadata to compare between groups, or .n to compare the number of cells in each group.
  If numeric column is given, the values should be the same for all cells in the same group.
  This will not be checked (only the first value is used).
  It is helpful to use Clones to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
  Also if you have subset set or NAs in group.by column, you should use .n to compare the number of cells in each group.
- * subset: An expression to subset the cells, will be passed to dplyr::filter(). Default is TRUE (no filtering).
- * each: A column name (without quotes) in metadata to split the cells.
  Each comparison will be done for each value in this column (typically each patient or subject).
- * uniq: Whether to return unique ids or not. Default is TRUE. If FALSE, you can mutate the meta data frame with the returned ids. For example, df |> mutate(expanded = expanded(...)).
- * debug: Return the data frame with intermediate columns instead of the ids. Default is FALSE.
- * with_ties: Whether to include ties (i.e. clones with the same size as the last clone) or not. Default is FALSE.

Requires

r-dplyr —
- check: {{proc.lang}} <(echo "library(dplyr)")
r-seurat —
- check: {{proc.lang}} <(echo "library(Seurat)")
r-tibble —
- check: {{proc.lang}} <(echo "library(tibble)")

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.DimPlots(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Seurat - Dimensional reduction plots

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

configfile — A toml configuration file with "cases"If this is given, envs.cases will be overriden
name — The name of the job, used in report
srtobj — The seruat object in RDS format

Output

outdir — The output directory

Envs

cases — The cases for the dim plotsKeys are the names and values are the arguments to Seurat::Dimplots

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.MarkersFinder(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Find markers between different groups of cells

When only group-by is specified as "seurat_clusters" in envs.cases, the markers will be found for all the clusters.

You can also find the differentially expressed genes between any two groups of cells by setting group-by to a different column name in metadata. Follow envs.cases for more details.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

srtobj — The seurat object loaded by SeuratPreparingIf you have your Seurat object prepared by yourself, you can also use it here, but you should make sure that the object has been processed by PrepSCTFindMarkers if data is not normalized using SCTransform.

Output

outdir — The output directory for the markers and plots

Envs

allmarker_plots (type=json) — All marker plot cases.The keys are the names of the cases and the values are the dicts inherited from allmarker_plots_defaults.
allmarker_plots_defaults (ns) —
Default options for the plots for all markers when ident-1 is not specified.
- - plot_type: The type of the plot.
  See https://pwwang.github.io/scplotter/reference/FeatureStatPlot.html.
  Available types are violin, box, bar, ridge, dim, heatmap and dot.
- - more_formats (list): The extra formats to save the plot in.
- - save_code (flag): Whether to save the code to generate the plot.
- - devpars (ns): The device parameters for the plots.
  - res (type=int): The resolution of the plots.
  - height (type=int): The height of the plots.
  - width (type=int): The width of the plots.
- - order_by: an expression to order the markers, passed by dplyr::arrange().
- - genes: The number of top genes to show or an expression passed to dplyr::filter() to filter the genes.
- - : Other arguments passed to scplotter::FeatureStatPlot().
assay — The assay to use.
cache (type=auto) — Where to cache the results.If True, cache to outdir of the job. If False, don't cache. Otherwise, specify the directory to cache to.
cases (type=json) — If you have multiple cases for marker discovery, you can specify themhere. The keys are the names of the cases and the values are the above options. If some options are not specified, the default values specified above (under envs) will be used. If no cases are specified, the default case will be added with the default values under envs with the name Marker Discovery.
dbs (list) — The dbs to do enrichment analysis for significantmarkers See below for all libraries. https://maayanlab.cloud/Enrichr/#libraries
each — The column name in metadata to separate the cells into differentcases. When this is specified, the case will be expanded for each value of the column in metadata. For example, when you have envs.cases."Cluster Markers".each = "Sample", then the case will be expanded as envs.cases."Cluster Markers - Sample1", envs.cases."Cluster Markers - Sample2", etc. You can specify allmarker_plots and overlaps to plot the markers for all cases in the same plot and plot the overlaps of the markers between different cases by values in this column.
enrich_plots (type=json) — Cases of the plots to generate for the enrichment analysis.The keys are the names of the cases and the values are the dicts inherited from enrich_plots_defaults. The cases under envs.cases can inherit this options.
enrich_plots_defaults (ns) —
Default options for the plots to generate for the enrichment analysis.
- - plot_type: The type of the plot.
  See https://pwwang.github.io/scplotter/reference/EnrichmentPlot.html.
  Available types are bar, dot, lollipop, network, enrichmap and wordcloud.
- - more_formats (list): The extra formats to save the plot in.
- - save_code (flag): Whether to save the code to generate the plot.
- - devpars (ns): The device parameters for the plots.
  - res (type=int): The resolution of the plots.
  - height (type=int): The height of the plots.
  - width (type=int): The width of the plots.
- - : See https://pwwang.github.io/scplotter/reference/EnrichmentPlot.htmll.
enrich_style (choice) —
The style of the enrichment analysis.The enrichment analysis will be done by EnrichIt() from enrichit. Two styles are available:
- - enrichr: enrichr style enrichment analysis (fisher's exact test will be used).
- - clusterprofiler: clusterProfiler style enrichment analysis (hypergeometric test will be used).
- - clusterProfiler: alias for clusterprofiler
error (flag) — Error out if no/not enough markers are found or no pathways are enriched.If False, empty results will be returned.
group-by — The column name in metadata to group the cells.If only group-by is specified, and ident-1 and ident-2 are not specified, markers will be found for all groups in this column in the manner of "group vs rest" comparison. NA group will be ignored. If None, Seurat::Idents(srtobj) will be used, which is usually "seurat_clusters" after unsupervised clustering.
ident-1 — The first group of cells to compareWhen this is empty, the comparisons will be expanded to each group v.s. the rest of the cells in group-by.
ident-2 — The second group of cells to compareIf not provided, the rest of the cells are used for ident-2.
marker_plots (type=json) — Cases of the plots to generate for the markers.Plot cases. The keys are the names of the cases and the values are the dicts inherited from marker_plots_defaults. The cases under envs.cases can inherit this options.
marker_plots_defaults (ns) —
Default options for the plots to generate for the markers.
- - plot_type: The type of the plot.
  See https://pwwang.github.io/scplotter/reference/FeatureStatPlot.html.
  Available types are violin, box, bar, ridge, dim, heatmap and dot.
  There are two additional types available - volcano_pct and volcano_log2fc.
- - more_formats (list): The extra formats to save the plot in.
- - save_code (flag): Whether to save the code to generate the plot.
- - devpars (ns): The device parameters for the plots.
  - res (type=int): The resolution of the plots.
  - height (type=int): The height of the plots.
  - width (type=int): The width of the plots.
- - order_by: an expression to order the markers, passed by dplyr::arrange().
- - genes: The number of top genes to show or an expression passed to dplyr::filter() to filter the genes.
- - : Other arguments passed to scplotter::FeatureStatPlot().
  If plot_type is volcano_pct or volcano_log2fc, they will be passed to
  scplotter::VolcanoPlot().
mutaters (type=json) —
The mutaters to mutate the metadataThere are also also 4 helper functions, expanded, collapsed, emerged and vanished, which can be used to identify the expanded/collpased/emerged/vanished groups (i.e. TCR clones). See also https://pwwang.github.io/immunopipe/configurations/#mutater-helpers. For example, you can use {"Patient1_Tumor_Collapsed_Clones": "expanded(., Source, 'Tumor', subset = Patent == 'Patient1', uniq = FALSE)"} to create a new column in metadata named Patient1_Tumor_Collapsed_Clones with the collapsed clones in the tumor sample (compared to the normal sample) of patient 1. The values in this columns for other clones will be NA. Those functions take following arguments:
- * df: The metadata data frame. You can use the . to refer to it.
- * group.by: The column name in metadata to group the cells.
- * idents: The first group or both groups of cells to compare (value in group.by column). If only the first group is given, the rest of the cells (with non-NA in group.by column) will be used as the second group.
- * subset: An expression to subset the cells, will be passed to dplyr::filter(). Default is TRUE (no filtering).
- * each: A column name (without quotes) in metadata to split the cells.
  Each comparison will be done for each value in this column (typically each patient or subject).
- * id: The column name in metadata for the group ids (i.e. CDR3.aa).
- * compare: Either a (numeric) column name (i.e. Clones) in metadata to compare between groups, or .n to compare the number of cells in each group.
  If numeric column is given, the values should be the same for all cells in the same group.
  This will not be checked (only the first value is used).
  It is helpful to use Clones to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
  Also if you have subset set or NAs in group.by column, you should use .n to compare the number of cells in each group.
- * uniq: Whether to return unique ids or not. Default is TRUE. If FALSE, you can mutate the meta data frame with the returned ids. For example, df |> mutate(expanded = expanded(...)).
- * debug: Return the data frame with intermediate columns instead of the ids. Default is FALSE.
- * order: The expression passed to dplyr::arrange() to order intermediate dataframe and get the ids in order accordingly.
  The intermediate dataframe includes the following columns:
  * <id>: The ids of clones (i.e. CDR3.aa).
  * <each>: The values in each column.
  * ident_1: The size of clones in the first group.
  * ident_2: The size of clones in the second group.
  * .diff: The difference between the sizes of clones in the first and second groups.
  * .sum: The sum of the sizes of clones in the first and second groups.
  * .predicate: Showing whether the clone is expanded/collapsed/emerged/vanished.
- * include_emerged: Whether to include the emerged group for expanded (only works for expanded). Default is FALSE.
- * include_vanished: Whether to include the vanished group for collapsed (only works for collapsed). Default is FALSE.
You can also use top() to get the top clones (i.e. the clones with the largest size) in each group.For example, you can use {"Patient1_Top10_Clones": "top(subset = Patent == 'Patient1', uniq = FALSE)"} to create a new column in metadata named Patient1_Top10_Clones. The values in this columns for other clones will be NA. This function takes following arguments:
- * df: The metadata data frame. You can use the . to refer to it.
- * id: The column name in metadata for the group ids (i.e. CDR3.aa).
- * n: The number of top clones to return. Default is 10.
  If n < 1, it will be treated as the percentage of the size of the group.
  Specify 0 to get all clones.
- * compare: Either a (numeric) column name (i.e. Clones) in metadata to compare between groups, or .n to compare the number of cells in each group.
  If numeric column is given, the values should be the same for all cells in the same group.
  This will not be checked (only the first value is used).
  It is helpful to use Clones to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
  Also if you have subset set or NAs in group.by column, you should use .n to compare the number of cells in each group.
- * subset: An expression to subset the cells, will be passed to dplyr::filter(). Default is TRUE (no filtering).
- * each: A column name (without quotes) in metadata to split the cells.
  Each comparison will be done for each value in this column (typically each patient or subject).
- * uniq: Whether to return unique ids or not. Default is TRUE. If FALSE, you can mutate the meta data frame with the returned ids. For example, df |> mutate(expanded = expanded(...)).
- * debug: Return the data frame with intermediate columns instead of the ids. Default is FALSE.
- * with_ties: Whether to include ties (i.e. clones with the same size as the last clone) or not. Default is FALSE.
ncores (type=int) —
Number of cores to use for parallel computing for some Seurat procedures.
- * Used in future::plan(strategy = "multicore", workers = <ncores>) to parallelize some Seurat procedures.
- * See also: https://satijalab.org/seurat/articles/future_vignette.html
overlaps (type=json) — Cases for investigating the overlapping of significant markers between different cases or comparisons.The keys are the names of the cases and the values are the dicts inherited from overlaps_defaults. There are two situations that we can perform overlaps: 1. If ident-1 is not specified, the overlaps can be performed between different comparisons. 2. If each is specified, the overlaps can be performed between different cases, where in each case, ident-1 must be specified.
overlaps_defaults (ns) —
Default options for investigating the overlapping of significant markers between different cases or comparisons.This means either ident-1 should be empty, so that they can be expanded to multiple comparisons.
- - sigmarkers: The expression to filter the significant markers for each case.
  If not provided, envs.sigmarkers will be used.
- - plot_type (choice): The type of the plot to generate for the overlaps.
  - venn: Use plotthis::VennDiagram().
  - upset: Use plotthis::UpsetPlot().
- - more_formats (list): The extra formats to save the plot in.
- - save_code (flag): Whether to save the code to generate the plot.
- - devpars (ns): The device parameters for the plots.
  - res (type=int): The resolution of the plots.
  - height (type=int): The height of the plots.
  - width (type=int): The width of the plots.
- - : More arguments pased to plotthis::VennDiagram()
  (https://pwwang.github.io/plotthis/reference/venndiagram1.html)
  or plotthis::UpsetPlot()
  (https://pwwang.github.io/plotthis/reference/upsetplot1.html)
rest (ns) —
Rest arguments for Seurat::FindMarkers().Use - to replace . in the argument name. For example, use min-pct instead of min.pct.
- - : See https://satijalab.org/seurat/reference/findmarkers
sigmarkers — An expression passed to dplyr::filter() to filter thesignificant markers for enrichment analysis. Available variables are p_val, avg_log2FC, pct.1, pct.2 and p_val_adj. For example, "p_val_adj < 0.05 & abs(avg_log2FC) > 1" to select markers with adjusted p-value < 0.05 and absolute log2 fold change > 1.
subset — An expression to subset the cells for each case.

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.TopExpressingGenes(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Find the top expressing genes in each cluster

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

srtobj — The seurat object in RDS format

Output

outdir — The output directory for the tables and plots

Envs

cases (type=json) — If you have multiple cases, you can specify themhere. The keys are the names of the cases and the values are the above options except mutaters. If some options are not specified, the default values specified above will be used. If no cases are specified, the default case will be added with the default values under envs with the name DEFAULT.
dbs (list) — The dbs to do enrichment analysis for significantmarkers See below for all libraries. https://maayanlab.cloud/Enrichr/#libraries
each — The column name in metadata to separate the cells into differentcases. When specified, ident must be specified
group-by — The column name in metadata to group the cells.
ident — The group of cells to find the top expressing genes.The cells will be selected by the group-by column with this ident value in metadata. If not provided, the top expressing genes will be found for all groups of cells in the group-by column.
mutaters (type=json) — The mutaters to mutate the metadata
n (type=int) — The number of top expressing genes to find.
prefix_each (flag) — Whether to prefix the each column name to thevalue as the case/section name.
section — The section name for the report.Worked only when each is not specified and ident is specified. Otherwise, the section name will be constructed from each and group-by. If DEFAULT, and it's the only section, it not included in the case/section names.
subset — An expression to subset the cells for each case.

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.ExprImputation(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

This process imputes the dropout values in scRNA-seq data.

It takes the Seurat object as input and outputs the Seurat object with imputed expression data.

Reference:

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

infile — The input file in RDS/qs format of Seurat object

Output

outfile — The output file in RDS format of Seurat objectNote that with rmagic and alra, the original default assay will be renamed to RAW and the imputed RNA assay will be renamed to RNA and set as default assay.

Envs

alra_args (type=json) — The arguments for RunALRA()
rmagic_args (ns) —
The arguments for rmagic
- - python: The python path where magic-impute is installed.
- - threshold (type=float): The threshold for magic imputation.
  Only the genes with dropout rates greater than this threshold (No. of
  cells with non-zero expression / total number of cells) will be imputed.
scimpute_args (ns) —
The arguments for scimpute
- - drop_thre (type=float): The dropout threshold
- - kcluster (type=int): Number of clusters to use
- - ncores (type=int): Number of cores to use
- - refgene: The reference gene file
tool (choice) —
Either alra, scimpute or rmagic
- - alra: Use RunALRA() from Seurat
- - scimpute: Use scImpute() from scimpute
- - rmagic: Use magic() from Rmagic

Requires

magic-impute —
- if: {{proc.envs.tool == "rmagic"}}
- check: {{proc.envs.rmagic_args.python}} -c "import magic")
r-dplyr —
- if: {{proc.envs.tool == "scimpute"}}
- check: {{proc.lang}} <(echo "library(dplyr)")
r-rmagic —
- if: {{proc.envs.tool == "rmagic"}}
- check: | {{proc.lang}} <( echo " tryCatch( { setwd(dirname(Sys.getenv('CONDA_PREFIX'))) }, error = function(e) NULL ); library(Rmagic) " )
r-scimpute —
- if: {{proc.envs.tool == "scimpute"}}
- check: {{proc.lang}} <(echo "library(scImpute)")
r-seurat —
- check: {{proc.lang}} <(echo "library(Seurat)")
r-seuratwrappers —
- if: {{proc.envs.tool == "alra"}}
- check: {{proc.lang}} <(echo "library(SeuratWrappers)")

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.SCImpute(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Impute the dropout values in scRNA-seq data.

Deprecated. Use ExprImputation instead.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

groupfile — The file to subset the matrix or label the cellsCould be an output from ImmunarchFilter
infile — The input file for imputationEither a SeuratObject or a matrix of count/TPM

Output

outfile — The output matrix

Envs

infmt — The input format.Either seurat or matrix

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.SeuratFilter(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Filtering cells from a seurat object

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

filters —
The filters to apply. Could be a file or string in TOML, ora python dictionary, with following keys:
- - mutaters: Create new columns in the metadata
- - filter: A R expression that will pass to
  subset(sobj, subset = ...) to filter the cells
srtobj — The seurat object in RDS

Output

outfile — The filtered seurat object in RDS

Envs

invert — Invert the selection?

Requires

r-dplyr —
- check: {{proc.lang}} <(echo "library('dplyr')")
r-seurat —
- check: {{proc.lang}} <(echo "library('Seurat')")

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.SeuratSubset(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Subset a seurat object into multiple seruat objects

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

srtobj — The seurat object in RDS
subsets —
The subsettings to apply. Could be a file or string in TOML, ora python dictionary, with following keys:
- - : Name of the case
  mutaters: Create new columns in the metadata
  subset: A R expression that will pass to
  subset(sobj, subset = ...)
  groupby: The column to group by, each value will be a case
  If groupby is given, subset will be ignored, each value
  of the groupby column will be a case

Output

outdir — The output directory with the subset seurat objects

Envs

ignore_nas — Ignore NA values?

Requires

r-dplyr —
- check: {{proc.lang}} <(echo "library('dplyr')")
r-seurat —
- check: {{proc.lang}} <(echo "library('Seurat')")

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.SeuratSplit(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Split a seurat object into multiple seruat objects

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

by — The metadata column to split by
srtobj — The seurat object in RDS

Output

outdir — The output directory with the subset seurat objects

Envs

by — The metadata column to split byIgnored if by is given in the input
recell — Rename the cell ids using the by columnA string of R function taking the original cell ids and by

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.Subset10X(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Subset 10X data, mostly used for testing

Requires r-matrix to load matrix.mtx.gz

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

indir — The input directory

Output

outdir — The output directory

Envs

feats_to_keep — The features/genes to keep.The final features list will be feats_to_keep + nfeats
ncells — The number of cells to keep.If <=1 then it will be the percentage of cells to keep
nfeats — The number of features to keep.If <=1 then it will be the percentage of features to keep
seed — The seed for random number generator

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.SeuratTo10X(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Write a Seurat object to 10X format

using write10xCounts from DropletUtils

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

srtobj — The seurat object in RDS

Output

outdir — The output directory.When envs.split_by is specified, the subdirectories will be created for each distinct value of the column. Otherwise, the matrices will be written to the output directory.

Envs

version — The version of 10X format

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.ScFGSEA(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Gene set enrichment analysis for cells in different groups using fgsea

This process allows us to do Gene Set Enrichment Analysis (GSEA) on the expression data, but based on variaties of grouping, including the from the meta data and the scTCR-seq data as well.

The GSEA is done using the fgsea package, which allows to quickly and accurately calculate arbitrarily low GSEA P-values for a collection of gene sets. The fgsea package is based on the fast algorithm for preranked GSEA described in Subramanian et al. 2005.

For each case, the process will generate a table with the enrichment scores for each gene set, and GSEA plots for the top gene sets.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

srtobj — The seurat object in RDS format

Output

outdir — The output directory for the results and plots

Envs

cases (type=json;order=99) — If you have multiple cases, you can specify them here.The keys are the names of the cases and the values are the above options except mutaters. If some options are not specified, the default values specified above will be used. If no cases are specified, the default case will be added with the name DEFAULT.
each — The column name in metadata to separate the cells into different subsets to do the analysis.
eps (type=float) — This parameter sets the boundary for calculating the p value.See https://rdrr.io/bioc/fgsea/man/fgseaMultilevel.html
gmtfile — The pathways in GMT format, with the gene names/ids in the same format as the seurat object.One could also use a URL to a GMT file. For example, from https://download.baderlab.org/EM_Genesets/current_release/Human/symbol/Pathways/.
group-by — The column name in metadata to group the cells.
ident-1 — The first group of cells to compare
ident-2 — The second group of cells to compare, if not provided, the rest of the cells that are not NAs in group-by column are used for ident-2.
maxsize (type=int) — Maximal size of a gene set to test. All pathways above the threshold are excluded.
method (choice) —
The method to do the preranking.
- - signal_to_noise: Signal to noise.
  The larger the differences of the means (scaled by the standard deviations);
  that is, the more distinct the gene expression is in each phenotype and the more the gene
  acts as a "class marker".
- - s2n: Alias of signal_to_noise.
- - abs_signal_to_noise: The absolute value of signal_to_noise.
- - abs_s2n: Alias of abs_signal_to_noise.
- - t_test: T test.
  Uses the difference of means scaled by the standard deviation and number of samples.
- - ratio_of_classes: Also referred to as fold change.
  Uses the ratio of class means to calculate fold change for natural scale data.
- - diff_of_classes: Difference of class means.
  Uses the difference of class means to calculate fold change for nature scale data
- - log2_ratio_of_classes: Log2 ratio of class means.
  Uses the log2 ratio of class means to calculate fold change for natural scale data.
  This is the recommended statistic for calculating fold change for log scale data.
minsize (type=int) — Minimal size of a gene set to test. All pathways below the threshold are excluded.
mutaters (type=json) —
The mutaters to mutate the metadata.The key-value pairs will be passed the dplyr::mutate() to mutate the metadata. There are also also 4 helper functions, expanded, collapsed, emerged and vanished, which can be used to identify the expanded/collpased/emerged/vanished groups (i.e. TCR clones). See also https://pwwang.github.io/immunopipe/configurations/#mutater-helpers. For example, you can use {"Patient1_Tumor_Collapsed_Clones": "expanded(., Source, 'Tumor', subset = Patent == 'Patient1', uniq = FALSE)"} to create a new column in metadata named Patient1_Tumor_Collapsed_Clones with the collapsed clones in the tumor sample (compared to the normal sample) of patient 1. The values in this columns for other clones will be NA. Those functions take following arguments:
- * df: The metadata data frame. You can use the . to refer to it.
- * group.by: The column name in metadata to group the cells.
- * idents: The first group or both groups of cells to compare (value in group.by column). If only the first group is given, the rest of the cells (with non-NA in group.by column) will be used as the second group.
- * subset: An expression to subset the cells, will be passed to dplyr::filter(). Default is TRUE (no filtering).
- * each: A column name (without quotes) in metadata to split the cells.
  Each comparison will be done for each value in this column (typically each patient or subject).
- * id: The column name in metadata for the group ids (i.e. CDR3.aa).
- * compare: Either a (numeric) column name (i.e. Clones) in metadata to compare between groups, or .n to compare the number of cells in each group.
  If numeric column is given, the values should be the same for all cells in the same group.
  This will not be checked (only the first value is used).
  It is helpful to use Clones to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
  Also if you have subset set or NAs in group.by column, you should use .n to compare the number of cells in each group.
- * uniq: Whether to return unique ids or not. Default is TRUE. If FALSE, you can mutate the meta data frame with the returned ids. For example, df |> mutate(expanded = expanded(...)).
- * debug: Return the data frame with intermediate columns instead of the ids. Default is FALSE.
- * order: The expression passed to dplyr::arrange() to order intermediate dataframe and get the ids in order accordingly.
  The intermediate dataframe includes the following columns:
  * <id>: The ids of clones (i.e. CDR3.aa).
  * <each>: The values in each column.
  * ident_1: The size of clones in the first group.
  * ident_2: The size of clones in the second group.
  * .diff: The difference between the sizes of clones in the first and second groups.
  * .sum: The sum of the sizes of clones in the first and second groups.
  * .predicate: Showing whether the clone is expanded/collapsed/emerged/vanished.
- * include_emerged: Whether to include the emerged group for expanded (only works for expanded). Default is FALSE.
- * include_vanished: Whether to include the vanished group for collapsed (only works for collapsed). Default is FALSE.
You can also use top() to get the top clones (i.e. the clones with the largest size) in each group.For example, you can use {"Patient1_Top10_Clones": "top(subset = Patent == 'Patient1', uniq = FALSE)"} to create a new column in metadata named Patient1_Top10_Clones. The values in this columns for other clones will be NA. This function takes following arguments:
- * df: The metadata data frame. You can use the . to refer to it.
- * id: The column name in metadata for the group ids (i.e. CDR3.aa).
- * n: The number of top clones to return. Default is 10.
  If n < 1, it will be treated as the percentage of the size of the group.
  Specify 0 to get all clones.
- * compare: Either a (numeric) column name (i.e. Clones) in metadata to compare between groups, or .n to compare the number of cells in each group.
  If numeric column is given, the values should be the same for all cells in the same group.
  This will not be checked (only the first value is used).
  It is helpful to use Clones to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
  Also if you have subset set or NAs in group.by column, you should use .n to compare the number of cells in each group.
- * subset: An expression to subset the cells, will be passed to dplyr::filter(). Default is TRUE (no filtering).
- * each: A column name (without quotes) in metadata to split the cells.
  Each comparison will be done for each value in this column (typically each patient or subject).
- * uniq: Whether to return unique ids or not. Default is TRUE. If FALSE, you can mutate the meta data frame with the returned ids. For example, df |> mutate(expanded = expanded(...)).
- * debug: Return the data frame with intermediate columns instead of the ids. Default is FALSE.
- * with_ties: Whether to include ties (i.e. clones with the same size as the last clone) or not. Default is FALSE.
ncores (type=int) — Number of cores for parallelizationPassed to nproc of fgseaMultilevel().
prefix_each (flag) — Whether to prefix the each column name to the values as the case/section name.
rest (type=json;order=98) — Rest arguments for fgsea()See also https://rdrr.io/bioc/fgsea/man/fgseaMultilevel.html
section — The section name for the report. Worked only when each is not specified. Otherwise, the section name will be constructed from each and its value.This allows different cases to be put into the same section in the report. The section is used to collect cases and put the results under the same directory and the same section in report. When each for a case is specified, the section will be ignored and case name will be used as section. The cases will be the expanded values in each column. When prefix_each is True, the column name specified by each will be prefixed to each value as directory name and expanded case name.
subset — An expression to subset the cells.
top (type=auto) — Do gsea table and enrich plot for top N pathways.If it is < 1, will apply it to padj, selecting pathways with padj < top.

Requires

bioconductor-fgsea —
- check: {{proc.lang}} -e "library(fgsea)"
r-seurat —
- check: {{proc.lang}} -e "library(seurat)"

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.CellTypeAnnotation(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Annotate the cell clusters. Currently, four ways are supported:

Pass the cell type annotation directly
Use ScType
Use scCATCH
Use hitype

The annotated cell types will replace the original seurat_clusters column in the metadata, so that the downstream processes will use the annotated cell types.

The old seurat_clusters column will be renamed to seurat_clusters_id.

If you are using ScType, scCATCH, or hitype, a text file containing the mapping from the old seurat_clusters to the new cell types will be generated and saved to cluster2celltype.tsv under <workdir>/<pipline_name>/CellTypeAnnotation/0/output/.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Examples

[CellTypeAnnotation.envs]tool = "direct"
cell_types = ["CellType1", "CellType2", "-", "CellType4"]

The cell types will be assigned as:

0 -> CellType1
1 -> CellType2
2 -> 2
3 -> CellType4

Input

sobjfile — The seurat object

Output

outfile — The rds file of seurat object with cell type annotated.A text file containing the mapping from the old seurat_clusters to the new cell types will be generated and saved to cluster2celltype.tsv under the job output directory.

Envs

cell_types (list) — The cell types to use for direct annotation.You can use "-" or "" as the placeholder for the clusters that you want to keep the original cell types (seurat_clusters). If the length of cell_types is shorter than the number of clusters, the remaining clusters will be kept as the original cell types. You can also use NA to remove the clusters from downstream analysis. This only works when envs.newcol is not specified.
/// Note If tool is direct and cell_types is not specified or an empty list, the original cell types will be kept and nothing will be changed. ///
celltypist_args (ns) —
The arguments for celltypist::celltypist() if tool is celltypist.
- - model: The path to model file.
- - python: The python path where celltypist is installed.
- - majority_voting: When true, it refines cell identities within local subclusters after an over-clustering approach
  at the cost of increased runtime.
- - over_clustering (type=auto): The column name in metadata to use as clusters for majority voting.
  Set to False to disable over-clustering.
- - assay: When converting a Seurat object to AnnData, the assay to use.
  If input is h5seurat, this defaults to RNA.
  If input is Seurat object in RDS, this defaults to the default assay.
hitype_db — The database to use for hitype.Compatible with sctype_db. See also https://pwwang.github.io/hitype/articles/prepare-gene-sets.html You can also use built-in databases, including hitypedb_short, hitypedb_full, and hitypedb_pbmc3k.
hitype_tissue — The tissue to use for hitype.Avaiable tissues should be the first column (tissueType) of hitype_db. If not specified, all rows in hitype_db will be used.
merge (flag) — Whether to merge the clusters with the same cell types.Otherwise, a suffix will be added to the cell types (ie. .1, .2, etc).
newcol — The new column name to store the cell types.If not specified, the seurat_clusters column will be overwritten. If specified, the original seurat_clusters column will be kept and Idents will be kept as the original seurat_clusters.
outtype (choice) —
The output file type. Currently only works for celltypist.An RDS file will be generated for other tools.
- - input: Use the same file type as the input.
- - rds: Use RDS file.
- - h5seurat: Use h5seurat file.
- - h5ad: Use AnnData file.
sccatch_args (ns) —
The arguments for scCATCH::findmarkergene() if tool is sccatch.
- - species: The specie of cells.
- - cancer: If the sample is from cancer tissue, then the cancer type may be defined.
- - tissue: Tissue origin of cells must be defined.
- - marker: The marker genes for cell type identification.
- - if_use_custom_marker (flag): Whether to use custom marker genes. If True, no species, cancer, and tissue are needed.
- - : Other arguments for scCATCH::findmarkergene().
  You can pass an RDS file to sccatch_args.marker to work as custom marker. If so,
  if_use_custom_marker will be set to TRUE automatically.
sctype_db — The database to use for sctype.Check examples at https://github.com/IanevskiAleksandr/sc-type/blob/master/ScTypeDB_full.xlsx
sctype_tissue — The tissue to use for sctype.Avaiable tissues should be the first column (tissueType) of sctype_db. If not specified, all rows in sctype_db will be used.
tool (choice) —
The tool to use for cell type annotation.
- - sctype: Use scType to annotate cell types.
  See https://github.com/IanevskiAleksandr/sc-type
- - hitype: Use hitype to annotate cell types.
  See https://github.com/pwwang/hitype
- - sccatch: Use scCATCH to annotate cell types.
  See https://github.com/ZJUFanLab/scCATCH
- - celltypist: Use celltypist to annotate cell types.
  See https://github.com/Teichlab/celltypist
- - direct: Directly assign cell types

Requires

r-HGNChelper —
- if: {{proc.envs.tool == 'sctype'}}
- check: {{proc.lang}} -e "library(HGNChelper)"
r-dplyr —
- if: {{proc.envs.tool == 'sctype'}}
- check: {{proc.lang}} -e "library(dplyr)"
r-openxlsx —
- if: {{proc.envs.tool == 'sctype'}}
- check: {{proc.lang}} -e "library(openxlsx)"
r-seurat —
- if: {{proc.envs.tool == 'sctype'}}
- check: {{proc.lang}} -e "library(Seurat)"

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.SeuratMap2Ref(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Map the seurat object to reference

See: https://satijalab.org/seurat/articles/integration_mapping.html and https://satijalab.org/seurat/articles/multimodal_reference_mapping.html

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

sobjfile — The seurat object

Output

outfile — The rds file of seurat object with cell type annotated.Note that the reduction name will be ref.umap for the mapping. To visualize the mapping, you should use ref.umap as the reduction name.

Envs

FindTransferAnchors (ns) —
Arguments for FindTransferAnchors()
- - normalization-method (choice): Name of normalization method used.
  - LogNormalize: Log-normalize the data matrix
  - SCT: Scale data using the SCTransform method
  - auto: Automatically detect the normalization method.
  See envs.refnorm.
- - reference-reduction: Name of dimensional reduction to use from the reference if running the pcaproject workflow.
  Optionally enables reuse of precomputed reference dimensional reduction.
- - : See https://satijalab.org/seurat/reference/findtransferanchors.
  Note that the hyphen (-) will be transformed into . for the keys.
MapQuery (ns) —
Arguments for MapQuery()
- - reference-reduction: Name of reduction to use from the reference for neighbor finding
- - reduction-model: DimReduc object that contains the umap model.
- - refdata (type=json): Extra data to transfer from the reference to the query.
- - : See https://satijalab.org/seurat/reference/mapquery.
  Note that the hyphen (-) will be transformed into . for the keys.
NormalizeData (ns) —
Arguments for NormalizeData()
- - normalization-method: Normalization method.
- - : See https://satijalab.org/seurat/reference/normalizedata.
  Note that the hyphen (-) will be transformed into . for the keys.
SCTransform (ns) —
Arguments for SCTransform()
- - do-correct-umi (flag): Place corrected UMI matrix in assay counts layer?
- - do-scale (flag): Whether to scale residuals to have unit variance?
- - do-center (flag): Whether to center residuals to have mean zero?
- - : See https://satijalab.org/seurat/reference/sctransform.
  Note that the hyphen (-) will be transformed into . for the keys.
cache (type=auto) — Whether to cache the information at different steps.If True, the seurat object will be cached in the job output directory, which will be not cleaned up when job is rerunning. The cached seurat object will be saved as <signature>.<kind>.RDS file, where <signature> is the signature determined by the input and envs of the process. See https://github.com/satijalab/seurat/issues/7849, https://github.com/satijalab/seurat/issues/5358 and https://github.com/satijalab/seurat/issues/6748 for more details also about reproducibility issues. To not use the cached seurat object, you can either set cache to False or delete the cached file at <signature>.RDS in the cache directory.
ident — The name of the ident for query transferred from envs.use of the reference.
mutaters (type=json) — The mutaters to mutate the metadata.This is helpful when we want to create new columns for split_by.
ncores (type=int;order=-100) — Number of cores to use.When split_by is used, this will be the number of cores for each object to map to the reference. When split_by is not used, this is used in future::plan(strategy = "multicore", workers = <ncores>) to parallelize some Seurat procedures. See also: https://satijalab.org/seurat/archive/v3.0/future_vignette.html
plots (type=json) — The plots to generate.The keys are the names of the plots and the values are the arguments for the plot. The arguments will be passed to biopipen.utils::VizSeuratMap2Ref() to generate the plots. The plots will be saved to the output directory. See https://pwwang.github.io/biopipen.utils.R/reference/VizSeuratMap2Ref.html.
ref — The reference seurat object file.Either an RDS file or a h5seurat file that can be loaded by Seurat::LoadH5Seurat(). The file type is determined by the extension. .rds or .RDS for RDS file, .h5seurat or .h5 for h5seurat file.
refnorm (choice) —
Normalization method the reference used. The same method will be used for the query.
- - LogNormalize: Using NormalizeData.
- - SCTransform: Using SCTransform.
- - SCT: Alias of SCTransform.
- - auto: Automatically detect the normalization method.
  If the default assay of reference is SCT, then SCTransform will be used.
skip_if_normalized — Skip normalization if the query is already normalized.Since the object is supposed to be generated by SeuratPreparing, it is already normalized. However, a different normalization method may be used. If the reference is normalized by the same method as the query, the normalization can be skipped. Otherwise, the normalization cannot be skipped. The normalization method used for the query set is determined by the default assay. If SCT, then SCTransform is used; otherwise, NormalizeData is used. You can set this to False to force re-normalization (with or without the arguments previously used).
split_by — The column name in metadata to split the query into multiple objects.This helps when the original query is too large to process.
use — A column name of metadata from the reference(e.g. celltype.l1, celltype.l2) to transfer to the query as the cell types (ident) for downstream analysis. This field is required. If you want to transfer multiple columns, you can use envs.MapQuery.refdata.

Requires

r-seurat —
- check: {{proc.lang}} -e "library(Seurat)"

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.RadarPlots(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Radar plots for cell proportion in different clusters.

This process generates the radar plots for the clusters of T cells. It explores the proportion of cells in different groups (e.g. Tumor vs Blood) in different T-cell clusters.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Examples

Let's say we have a metadata like this:

Cell	Source	Timepoint	seurat_clusters
A	Blood	Pre	0
B	Blood	Pre	0
C	Blood	Post	1
D	Blood	Post	1
E	Tumor	Pre	2
F	Tumor	Pre	2
G	Tumor	Post	3
H	Tumor	Post	3

With configurations:

[RadarPlots.envs]
by = "Source"

Then we will have a radar plots like this:

Radar plots

We can use each to separate the cells into different cases:

[RadarPlots.envs]
by = "Source"
each = "Timepoint"

Then we will have two radar plots, one for Pre and one for Post:

Radar plots

Using cluster_order to change the order of the clusters and show only the first 3 clusters:

[RadarPlots.envs]
by = "Source"
cluster_order = ["2", "0", "1"]
breaks = [0, 50, 100]  # also change the breaks

Radar plots cluster_order

n. /

Input

srtobj — The seurat object in RDS or qs/qs2 format

Output

outdir — The output directory for the plots

Envs

bar_devpars (ns) —
The parameters for png() for the barplot
- - res (type=int): The resolution of the plot
- - height (type=int): The height of the plot
- - width (type=int): The width of the plot
breakdown — An additional column with groups to break down the cellsdistribution in each cluster. For example, if you want to see the distribution of the cells in each cluster in different samples. In this case, you should have multiple values in each by. These values won't be plotted in the radar plot, but a barplot will be generated with the mean value of each group and the error bar.
breaks (list;itype=int) — breaks of the radar plots, from 0 to 100.If not given, the breaks will be calculated automatically.
by — Which column to use to separate the cells in different groups.NAs will be ignored. For example, If you have a column named Source that marks the source of the cells, and you want to separate the cells into Tumor and Blood groups, you can set by to Source. The there will be two curves in the radar plot, one for Tumor and one for Blood.
cases (type=json) — The cases for the multiple radar plots.Keys are the names of the cases and values are the arguments for the plots (each, by, order, breaks, direction, ident, cluster_order and devpars). If not cases are given, a default case will be used, with the key DEFAULT. The keys must be valid string as part of the file name.
cluster_order (list) — The order of the clusters.You may also use it to filter the clusters. If not given, all clusters will be used. If the cluster names are integers, use them directly for the order, even though a prefix Cluster is added on the plot.
colors — The colors for the groups in by. If not specified,the default colors will be used. Multiple colors can be separated by comma (,). You can specify biopipen to use the biopipen palette.
devpars (ns) —
The parameters for png()
- - res (type=int): The resolution of the plot
- - height (type=int): The height of the plot
- - width (type=int): The width of the plot
direction (choice) —
Direction to calculate the percentages.
- - inter-cluster: the percentage of the cells in all groups
  in each cluster (percentage adds up to 1 for each cluster).
- - intra-cluster: the percentage of the cells in all clusters.
  (percentage adds up to 1 for each group).
each — A column with values to separate all cells in different casesWhen specified, the case will be expanded to multiple cases for each value in the column. If specified, section will be ignored, and the case name will be used as the section name.
ident — The column name of the cluster information.
mutaters (type=json) — Mutaters to mutate the metadata of theseurat object. Keys are the column names and values are the expressions to mutate the columns. These new columns will be used to define your cases.
order (list) — The order of the values in by. You can also limit(filter) the values we have in by. For example, if column Source has values Tumor, Blood, Spleen, and you only want to plot Tumor and Blood, you can set order to ["Tumor", "Blood"]. This will also have Tumor as the first item in the legend and Blood as the second item.
prefix_each (flag) — Whether to prefix the each column name to the values as thecase/section name.
section — If you want to put multiple cases into a same sectionin the report, you can set this option to the name of the section. Only used in the report.
subset — The subset of the cells to do the analysis.
test (choice) —
The test to use to calculate the p values.If there are more than 2 groups in by, the p values will be calculated pairwise group by group. Only works when breakdown is specified and by has 2 groups or more.
- - wilcox: Wilcoxon rank sum test
- - t: T test
- - none: No test will be performed

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.MetaMarkers(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Find markers between three or more groups of cells, using one-way ANOVAor Kruskal-Wallis test.

Sometimes, you may want to find the markers for cells from more than 2 groups. In this case, you can use this process to find the markers for the groups and do enrichment analysis for the markers. Each marker is examined using either one-way ANOVA or Kruskal-Wallis test. The p values are adjusted using the specified method. The significant markers are then used for enrichment analysis using enrichr api.

Other than the markers and the enrichment analysis as outputs, this process also generates violin plots for the top 10 markers.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

srtobj — The seurat object loaded by SeuratPreparing

Output

outdir — The output directory for the markers

Envs

cases (type=json) — If you have multiple cases, you can specify themhere. The keys are the names of the cases and the values are the above options except ncores and mutaters. If some options are not specified, the default values specified above will be used. If no cases are specified, the default case will be added with the default values under envs with the name DEFAULT.
dbs (list) — The dbs to do enrichment analysis for significantmarkers See below for all libraries. https://maayanlab.cloud/Enrichr/#libraries
each — The column name in metadata to separate the cells into different cases.
group-by — The column name in metadata to group the cells.If only group-by is specified, and idents are not specified, markers will be found for all groups in this column. NA group will be ignored.
idents — The groups of cells to compare, values should be in the group-by column.
method (choice) —
The method for the test.
- - anova: One-way ANOVA
- - kruskal: Kruskal-Wallis test
mutaters (type=json) —
The mutaters to mutate the metadataThe key-value pairs will be passed the dplyr::mutate() to mutate the metadata. There are also also 4 helper functions, expanded, collapsed, emerged and vanished, which can be used to identify the expanded/collpased/emerged/vanished groups (i.e. TCR clones). See also https://pwwang.github.io/immunopipe/configurations/#mutater-helpers. For example, you can use {"Patient1_Tumor_Collapsed_Clones": "expanded(., Source, 'Tumor', subset = Patent == 'Patient1', uniq = FALSE)"} to create a new column in metadata named Patient1_Tumor_Collapsed_Clones with the collapsed clones in the tumor sample (compared to the normal sample) of patient 1. The values in this columns for other clones will be NA. Those functions take following arguments:
- * df: The metadata data frame. You can use the . to refer to it.
- * group.by: The column name in metadata to group the cells.
- * idents: The first group or both groups of cells to compare (value in group.by column). If only the first group is given, the rest of the cells (with non-NA in group.by column) will be used as the second group.
- * subset: An expression to subset the cells, will be passed to dplyr::filter(). Default is TRUE (no filtering).
- * each: A column name (without quotes) in metadata to split the cells.
  Each comparison will be done for each value in this column (typically each patient or subject).
- * id: The column name in metadata for the group ids (i.e. CDR3.aa).
- * compare: Either a (numeric) column name (i.e. Clones) in metadata to compare between groups, or .n to compare the number of cells in each group.
  If numeric column is given, the values should be the same for all cells in the same group.
  This will not be checked (only the first value is used).
  It is helpful to use Clones to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
  Also if you have subset set or NAs in group.by column, you should use .n to compare the number of cells in each group.
- * uniq: Whether to return unique ids or not. Default is TRUE. If FALSE, you can mutate the meta data frame with the returned ids. For example, df |> mutate(expanded = expanded(...)).
- * debug: Return the data frame with intermediate columns instead of the ids. Default is FALSE.
- * order: The expression passed to dplyr::arrange() to order intermediate dataframe and get the ids in order accordingly.
  The intermediate dataframe includes the following columns:
  * <id>: The ids of clones (i.e. CDR3.aa).
  * <each>: The values in each column.
  * ident_1: The size of clones in the first group.
  * ident_2: The size of clones in the second group.
  * .diff: The difference between the sizes of clones in the first and second groups.
  * .sum: The sum of the sizes of clones in the first and second groups.
  * .predicate: Showing whether the clone is expanded/collapsed/emerged/vanished.
- * include_emerged: Whether to include the emerged group for expanded (only works for expanded). Default is FALSE.
- * include_vanished: Whether to include the vanished group for collapsed (only works for collapsed). Default is FALSE.
You can also use top() to get the top clones (i.e. the clones with the largest size) in each group.For example, you can use {"Patient1_Top10_Clones": "top(subset = Patent == 'Patient1', uniq = FALSE)"} to create a new column in metadata named Patient1_Top10_Clones. The values in this columns for other clones will be NA. This function takes following arguments:
- * df: The metadata data frame. You can use the . to refer to it.
- * id: The column name in metadata for the group ids (i.e. CDR3.aa).
- * n: The number of top clones to return. Default is 10.
  If n < 1, it will be treated as the percentage of the size of the group.
  Specify 0 to get all clones.
- * compare: Either a (numeric) column name (i.e. Clones) in metadata to compare between groups, or .n to compare the number of cells in each group.
  If numeric column is given, the values should be the same for all cells in the same group.
  This will not be checked (only the first value is used).
  It is helpful to use Clones to use the raw clone size from TCR data, in case the cells are not completely mapped to RNA data.
  Also if you have subset set or NAs in group.by column, you should use .n to compare the number of cells in each group.
- * subset: An expression to subset the cells, will be passed to dplyr::filter(). Default is TRUE (no filtering).
- * each: A column name (without quotes) in metadata to split the cells.
  Each comparison will be done for each value in this column (typically each patient or subject).
- * uniq: Whether to return unique ids or not. Default is TRUE. If FALSE, you can mutate the meta data frame with the returned ids. For example, df |> mutate(expanded = expanded(...)).
- * debug: Return the data frame with intermediate columns instead of the ids. Default is FALSE.
- * with_ties: Whether to include ties (i.e. clones with the same size as the last clone) or not. Default is FALSE.
ncores (type=int) — Number of cores to use to parallelize for genes
p_adjust (choice) —
The method to adjust the p values, which can be used to filter the significant markers.See also https://rdrr.io/r/stats/p.adjust.html
- - holm: Holm-Bonferroni method
- - hochberg: Hochberg method
- - hommel: Hommel method
- - bonferroni: Bonferroni method
- - BH: Benjamini-Hochberg method
- - BY: Benjamini-Yekutieli method
- - fdr: FDR method of Benjamini-Hochberg
- - none: No adjustment
prefix_each (flag) — Whether to add the each value as prefix to the case name.
section — The section name for the report.Worked only when each is not specified. Otherwise, the section name will be constructed from each and group-by. If DEFAULT, and it's the only section, it not included in the case/section names. The section is used to collect cases and put the results under the same directory and the same section in report. When each for a case is specified, the section will be ignored and case name will be used as section. The cases will be the expanded values in each column. When prefix_each is True, the column name specified by each will be prefixed to each value as directory name and expanded case name.
sigmarkers — An expression passed to dplyr::filter() to filter thesignificant markers for enrichment analysis. The default is p.value < 0.05. If method = 'anova', the variables that can be used for filtering are: sumsq, meansq, statistic, p.value and p_adjust. If method = 'kruskal', the variables that can be used for filtering are: statistic, p.value and p_adjust.
subset — The subset of the cells to do the analysis.An expression passed to dplyr::filter().

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.Seurat2AnnData(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Convert seurat object to AnnData

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

sobjfile — The seurat object file, in RDS or h5seurat format

Output

outfile — The AnnData file

Envs

assay — The assay to use for AnnData.If not specified, the default assay will be used.

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.AnnData2Seurat(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Convert AnnData to seurat object

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

adfile — The AnnData file

Output

outfile — The seurat object file in RDS format

Envs

assay — The assay to use to convert to seurat object.
dotplot_check (type=auto) — Whether to do a check with Seurat::DotPlotto see if the conversion is successful. Set to False to disable the check. If True, top 10 variable genes will be used for the check. You can give a list of genes or a string of genes with comma (,) separated to use for the check. Only works for outtype = 'rds'.
outtype (choice) —
The output file type.
- - rds: RDS file
- - h5seurat: h5seurat file

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.ScSimulation(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Simulate single-cell data using splatter.

See https://www.bioconductor.org/packages/devel/bioc/vignettes/splatter/inst/doc/splatter.html#2_Quickstart

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

seed — The seed for the simulationYou could also use string as the seed, and the seed will be generated by digest::digest2int(). So this could also work as a unique identifier for the simulation (ie. Sample ID).

Output

outfile — The output Seurat object/SingleCellExperiment in RDS format

Envs

method (choice) —
which simulation method to use. Options are:
- - single: produces a single population
- - groups: produces distinct groups (eg. cell types), or
- - paths: selects cells from continuous trajectories (eg. differentiation processes)
ncells (type=int) — The number of cells to simulate
ngenes (type=int) — The number of genes to simulate
nspikes (type=int) — The number of spike-ins to simulateWhen ngenes, ncells, and nspikes are not specified, the default params from mockSCE() will be used. By default, ngenes = 2000, ncells = 200, and nspikes = 100.
outtype (choice) —
The output file type.
- - seurat: Seurat object
- - singlecellexperiment: SingleCellExperiment object
- - sce: alias for singlecellexperiment
params (ns) — Other parameters for simulation.The parameters are initialized splitEstimate(mockSCE()) and then updated with the given parameters. See https://rdrr.io/bioc/splatter/man/SplatParams.html. Hyphens (-) will be transformed into dots (.) for the keys.

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.CellCellCommunication(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Cell-cell communication inference

This is implemented based on LIANA, which is a Python package for cell-cell communication inference and provides a list of existing methods including CellPhoneDB, Connectome, log2FC, NATMI, SingleCellSignalR, Rank_Aggregate, Geometric Mean, scSeqComm, and CellChat.

You can also try python -c 'import liana; liana.mt.show_methods()' to see the methods available.

Note that this process does not do any visualization. You can use CellCellCommunicationPlots to visualize the results.

Reference:

- Review.
- LIANA.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

sobjfile — The seurat object file in RDS or h5seurat format or AnnData file.

Output

outfile —
The output file with the 'liana_res' data frame.Stats are provided for both ligand and receptor entities, more specifically: ligand and receptor are the two entities that potentially interact. As a reminder, CCC events are not limited to secreted signalling, but we refer to them as ligand and receptor for simplicity. Also, in the case of heteromeric complexes, the ligand and receptor columns represent the subunit with minimum expression, while *complex corresponds to the actual complex, with subunits being separated by . source and target columns represent the source/sender and target/receiver cell identity for each interaction, respectively
- * *_props: represents the proportion of cells that express the entity.
  By default, any interactions in which either entity is not expressed in above 10%% of cells per cell type
  is considered as a false positive, under the assumption that since CCC occurs between cell types, a sufficient
  proportion of cells within should express the genes.
- * *_means: entity expression mean per cell type.
- * lr_means: mean ligand-receptor expression, as a measure of ligand-receptor interaction magnitude.
- * cellphone_pvals: permutation-based p-values, as a measure of interaction specificity.

Envs

— Other arguments for the method.The arguments are passed to the method directly. See the method documentation for more details and also help(liana.mt.<method>.__call__) in Python.
assay — The assay to use for the analysis.Only works for Seurat object.
expr_prop (type=float) — Minimum expression proportion for the ligands andreceptors (+ their subunits) in the corresponding cell identities. Set to 0 to return unfiltered results.
groupby — The column name in metadata to group the cells.Typically, this column should be the cluster id.
method (choice) —
The method to use for cell-cell communication inference.
- - CellPhoneDB: Use CellPhoneDB method.
  Magnitude Score: lr_means; Specificity Score: cellphone_pvals.
- - Connectome: Use Connectome method.
- - log2FC: Use log2FC method.
- - NATMI: Use NATMI method.
- - SingleCellSignalR: Use SingleCellSignalR method.
- - Rank_Aggregate: Use Rank_Aggregate method.
- - Geometric_Mean: Use Geometric Mean method.
- - scSeqComm: Use scSeqComm method.
- - CellChat: Use CellChat method.
- - cellphonedb: alias for CellPhoneDB
- - connectome: alias for Connectome
- - log2fc: alias for log2FC
- - natmi: alias for NATMI
- - singlesignaler: alias for SingleCellSignalR
- - rank_aggregate: alias for Rank_Aggregate
- - geometric_mean: alias for Geometric_Mean
- - scseqcomm: alias for scSeqComm
- - cellchat: alias for CellChat
min_cells (type=int) — Minimum cells (per cell identity if grouped by groupby)to be considered for downstream analysis.
n_perms (type=int) — Number of permutations for the permutation test.Relevant only for permutation-based methods (e.g., CellPhoneDB). If 0 is passed, no permutation testing is performed.
ncores (type=int) — The number of cores to use.
rscript — The path to the Rscript executable used to convert RDS file to AnnData.if in.sobjfile is an RDS file, it will be converted to AnnData file (h5ad). You need Seurat, SeuratDisk and digest installed.
seed (type=int) — The seed for the random number generator.
species (choice) —
The species of the cells.
- - human: Human cells, the 'consensus' resource will be used.
- - mouse: Mouse cells, the 'mouseconsensus' resource will be used.
split_by — The column name in metadata to split the cells to run the method separately.The results will be combined together with this column in the final output.
subset — An expression in string to subset the cells.When a .rds or .h5seurat file is provided for in.sobjfile, you can provide an expression in R, which will be passed to base::subset() in R to subset the cells. But you can always pass an expression in python to subset the cells. See https://anndata.readthedocs.io/en/latest/tutorials/notebooks/getting-started.html#subsetting-using-metadata. You should use adata to refer to the AnnData object. For example, adata.obs.groups == "g1" will subset the cells with groups equal to g1.
subset_using —
The method to subset the cells.
- - auto: Automatically detect the method to use.
  Note that this is not always accurate. We simply check if [ is in the expression.
  If so, we use python to subset the cells; otherwise, we use R.
- - python: Use python to subset the cells.
- - r: Use R to subset the cells.

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.CellCellCommunicationPlots(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Visualization for cell-cell communication inference.

R package CCPlotR is used to visualize the results.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

cccfile — The output file from CellCellCommunicationor a tab-separated file with the following columns: source, target, ligand, receptor, and score. If so, in.expfile can be provided where exp_df is needed.
expfile — The expression file with the expression of ligands and receptors.Columns include: cell_type, gene and mean_exp.

Output

outdir — The output directory for the plots.

Envs

cases (type=json) —
The cases for the plots.The keys are the names of the cases and the values are the arguments for the plots. The arguments include:
- * kind: one of arrow, circos, dotplot, heatmap, network,
  and sigmoid.
- * devpars: The parameters for png() for the plot, including res,
  width, and height.
- * section: The section name for the report to group the plots.
- * : Other arguments for cc_<kind> function in CCPlotR.
  See the documentation for more details.
  Or you can use ?CCPlotR::cc_<kind> in R.
score_col — The column name in the input file that contains the score, ifthe input file is from CellCellCommunication. Two alias columns are added in the result file of CellCellCommunication, mag_score and spec_score, which are the magnitude and specificity scores.
subset — An expression to pass to dplyr::filter() to subset the ccc data.

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.ScVelo(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Velocity analysis for single-cell RNA-seq data

This process is implemented based on the Python package scvelo (v0.3.3). Note that it doesn't work with numpy>=2.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

sobjfile — The seurat object file in RDS or h5seurat format or AnnData file.

Output

outfile — The output object with the velocity embeddings and information.In either RDS, h5seurat or h5ad format, depending on the envs.outtype. There will be also plots generated in the output directory (parent directory of outfile). Note that these plots will not be used in the report, but can be used as supplementary information for the velocity analysis. To visualize the velocity embeddings, you can use the SeuratClusterStats process with v_reduction provided to one of the envs.dimplots.

Envs

calculate_velocity_genes (flag) — Whether to calculate the velocity genes.
denoise (flag) — Whether to denoise the data.
denoise_topn (type=int) — Number of genes with highest likelihood selected toinfer velocity directions.
fitting_by (choice) —
The mode to use for fitting the velocities.
- - stochastic: Stochastic mode
- - deterministic: Deterministic mode
group_by — The column name in metadata to group the cells.Typically, this column should be the cluster id.
kinetics (flag) — Whether to compute the RNA velocity kinetics.
kinetics_topn (type=int) — Number of genes with highest likelihood selected toinfer velocity directions.
min_shared_counts (type=int) — Minimum number of counts(both unspliced and spliced) required for a gene.
mode (type=list) — The mode to use for the velocity analysis.It should be a subset of ['deterministic', 'stochastic', 'dynamical'], meaning that we can perform the velocity analysis in multiple modes.
n_neighbors (type=int) — The number of neighbors to use for the velocity graph.
n_pcs (type=int) — The number of PCs to use for the velocity graph.
ncores (type=int) — Number of cores to use.
outtype (choice) —
The output file type.
- - : The same as the input file type.
- - h5seurat: h5seurat file
- - h5ad: h5ad file
- - qs: qs/qs2 file
- - qs2: qs2 file
- - rds: RDS file
rscript — The path to the Rscript executable used to convert RDS file to AnnData.if in.sobjfile is an RDS file, it will be converted to AnnData file (h5ad). You need Seurat, SeuratDisk and digest installed.
top_n (type=int) — The number of top features to plot.

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.Slingshot(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Trajectory inference using Slingshot

This process is implemented based on the R package slingshot.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

sobjfile — The seurat object file in RDS or qs format.

Output

outfile — The output object with the trajectory information.The lineages are stored in the metadata of the seurat object at columns LineageX, where X is the lineage number. The BranchID column contains the branch id for each cell. One can use scplotter::CellDimPlot(object, lineages = c("Lineage1", "Lineage2", ...)) to visualize the trajectories.

Envs

align_start (flag) — Whether to align the starting pseudotime values at the maximum pseudotime.
dims (type=auto) — The dimensions to use for the analysis.A list or a string with comma separated values. Consecutive numbers can be specified with a colon (:) or a dash (-).
end — The ending group for the Slingshot analysis.
group_by — The column name in metadata to group the cells.Typically, this column should be the cluster id.
prefix — The prefix to add to the column names of the resulting pseudotime variable.
reduction — The nonlinear reduction to use for the trajectory analysis.
reverse (flag) — Logical value indicating whether to reverse the pseudotime variable.
seed (type=int) — The seed for the random number generator.
start — The starting group for the Slingshot analysis.

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.scrna.LoomTo10X(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Convert Loom file to 10X format

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

loomfile — The Loom file

Output

outdir — The output directory for the 10X format files,including the matrix.mtx.gz, barcodes.tsv.gz and features.tsv.gz files.

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

biopipen.ns.scrna

biopipen.ns.scrna.SeuratLoading(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

init()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

run()

biopipen.ns.scrna.SeuratPreparing(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

init()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

run()

biopipen.ns.scrna.SeuratClustering(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

init()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

run()

biopipen.ns.scrna.SeuratSubClustering(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

init()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

run()

biopipen.ns.scrna.SeuratClusterStats(*args, **kwds) → Proc

Number of cells in each cluster

Number of cells in each cluster by groups

Violin plots for the gene expressions

Dimension reduction plot with labels

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

init()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

run()

biopipen.ns.scrna.ModuleScoreCalculator(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

init()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

run()

biopipen.ns.scrna.CellsDistribution(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

init()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

run()

biopipen.ns.scrna.SeuratMetadataMutater(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

init()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

run()

biopipen.ns.scrna.DimPlots(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

init()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

run()

biopipen.ns.scrna.MarkersFinder(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

`biopipen.ns.scrna.SeuratLoading(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

`__init_subclass__()`

`init()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`run()`

`biopipen.ns.scrna.SeuratPreparing(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

`__init_subclass__()`

`init()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`run()`

`biopipen.ns.scrna.SeuratClustering(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

`__init_subclass__()`

`init()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`run()`

`biopipen.ns.scrna.SeuratSubClustering(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

`__init_subclass__()`

`init()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`run()`

`biopipen.ns.scrna.SeuratClusterStats(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

`__init_subclass__()`

`init()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`run()`

`biopipen.ns.scrna.ModuleScoreCalculator(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

`__init_subclass__()`

`init()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`run()`

`biopipen.ns.scrna.CellsDistribution(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

`__init_subclass__()`

`init()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`run()`

`biopipen.ns.scrna.SeuratMetadataMutater(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

`__init_subclass__()`

`init()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`run()`

`biopipen.ns.scrna.DimPlots(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

`__init_subclass__()`

`init()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`run()`

`biopipen.ns.scrna.MarkersFinder(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

`__init_subclass__()`

`init()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`run()`