biopipen.ns.regulatory

module

biopipen.ns.regulatory

</>

Provides processes for the regulatory related

Classes

MotifScan (Proc) — Scan the input sequences for binding sites using motifs.</>
MotifAffinityTest (Proc) — Test the affinity of motifs to the sequences and the affinity changedue the mutations. </>
VariantMotifPlot (Proc) — A plot with a genomic region surrounding a genomic variant, andpotentially disrupted motifs. </>

class

`biopipen.ns.regulatory.MotifScan(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Scan the input sequences for binding sites using motifs.

Currently only fimo from MEME suite is supported, based on the research/comparisons done by the following reference.

Reference: - Evaluating tools for transcription factor binding site prediction

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

motiffile — File containing motif names.The file contains the motif and regulator names. The motif names should match the names in the motif database. This file must have a header. If multiple columns are present, it should be delimited by tab.
seqfile — File containing sequences in FASTA format.

Output

outdir — Directory containing the results.Especially fimo_output.txt extending from fimo.tsv, which contains: 1. the results with the regulator information if envs.regulator_col is provided, otherwise, the regulator columns will be filled with the motif names. 2. the original sequence from the fasta file (in.seqfile) 3. corrected genomic coordinates if the genomic coordinates are included in the sequence names.
See also the Output section of https://meme-suite.org/meme/doc/fimo.html. Note that --no-pgc is passed to fimo to not parse the genomic coordinates from the sequence names by fimo. When fimo parses the genomic coordinates, DDX11L1 in >DDX11L1::chr1:11869-14412 will be lost. The purpose of this is to keep the sequence names as they are in the output. If the sequence names are in the format of >NAME::chr1:START-END, we will correct the coordinates in the output. Also note that it requires meme/fimo v5.5.5+ to do this (where the --no-pgc option is available).

Envs

args (ns) —
Additional arguments to pass to the tool.
- - : Additional arguments for fimo.
  See: https://meme-suite.org/meme/doc/fimo.html
cutoff (type=float) — The cutoff for p-value to write the results.When envs.q_cutoff is set, this is applied to the q-value. This is passed to --thresh in fimo.
fimo — The path to fimo binary.
motif_col — The column name in the motif file containing the motif names.
motifdb — The path to the motif database. This is required.It should be in the format of MEME motif database. Databases can be downloaded here: https://meme-suite.org/meme/doc/download.html. See also introduction to the databases: https://meme-suite.org/meme/db/motifs.
notfound (choice) —
What to do if a motif is not found in the database.
- - error: Report error and stop the process.
- - ignore: Ignore the motif and continue.
q (flag) — Calculate q-value.When False, --no-qvalue is passed to fimo. The q-value calculation is that of Benjamini and Hochberg (BH) (1995).
q_cutoff (flag) — Apply envs.cutoff to q-value.
regulator_col — The column name in the motif file containing the regulator names.Both motif_col and regulator_col should be the direct column names or the index (1-based) of the columns. If no regulator_col is provided, no regulator information is written in the output.
tool (choice) —
The tool to use for scanning.Currently only fimo is supported.
- - fimo: Use fimo from MEME suite.

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.regulatory.MotifAffinityTest(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Test the affinity of motifs to the sequences and the affinity changedue the mutations.

When using atSNP, motifBreakR is also required to plot the variants and motifs.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

motiffile — File containing motif names.The file contains the motif and regulator names. The motif names should match the names in the motif database. This file must have a header. If multiple columns are present, it should be delimited by tab.
varfile — File containing the variants.It could be a VCF file or a BED-like file. If it is a VCF file, it does not need to be indexed. Only records with PASS in the FILTER column are used. If it is a BED-like file, it should contain the following columns, chrom, start, end, name, score, strand, ref, alt.

Output

outdir — Directory containing the results.For motifBreakR, motifbreakr.txt will be created. Records with effect strong/weak are written (neutral is not). For atSNP, atsnp.txt will be created. Records with p-value (envs.atsnp_args.p) < envs.cutoff are written.

Envs

atsnp_args (ns) —
Additional arguments to pass to atSNP.
- - padj_cutoff (flag): The envs.cutoff will be applied to the adjusted p-value.
  Only works for atSNP.
- - padj (choice): The method to adjust the p-values.
  Only works for atSNP
  - holm: Holm's method
  - hochberg: Hochberg's method
  - hommel: Hommel's method
  - bonferroni: Bonferroni method
  - BH: Benjamini & Hochberg's method
  - BY: Benjamini & Yekutieli's method
  - fdr: False discovery rate
  - none: No adjustment
- - p (choice): Which p-value to use for adjustment and cutoff.
  - pval_ref: p-value for the reference allele affinity score.
  - pval_snp: p-value for the SNP allele affinity score.
  - pval_cond_ref: and
  - pval_cond_snp: conditional p-values for the affinity scores of the reference and SNP alleles.
  - pval_diff: p-value for the affinity score change between the two alleles.
  - pval_rank: p-value for the rank test between the two alleles.
bcftools — The path to bcftools binary.Used to convert the VCF file to the BED file when the input is a VCF file.
cutoff (type=float) — The cutoff for p-value to write the results.
devpars (ns) —
The default device parameters for the plot.
- - width (type=int): The width of the plot.
- - height (type=int): The height of the plot.
- - res (type=int): The resolution of the plot.
genome — The genome assembly.Used to fetch the sequences around the variants by package, for example, BSgenome.Hsapiens.UCSC.hg19 is required if hg19. If it is an organism other than human, please specify the full name of the package, for example, BSgenome.Mmusculus.UCSC.mm10.
motif_col — The column name in the motif file containing the motif names.If this is not provided, envs.regulator_col and envs.regmotifs are required, which are used to infer the motif names from the regulator names.
motifbreakr_args (ns) —
Additional arguments to pass to motifBreakR.
- - method (choice): The method to use.
  See details of https://rdrr.io/bioc/motifbreakR/man/motifbreakR.html
  and https://simon-coetzee.github.io/motifBreakR/#methods.
  - default: Use the default method.
  - log: Use the standard summation of log probabilities
  - ic: Use information content
  - notrans: Use the default method without transformation
motifdb — The path to the motif database. This is required.It should be in the format of MEME motif database. Databases can be downloaded here: https://meme-suite.org/meme/doc/download.html. See also introduction to the databases: https://meme-suite.org/meme/db/motifs. universalmotif is required to read the motif database.
ncores (type=int) — The number of cores to use.
notfound (choice) —
What to do if a motif is not found in the database,or a regulator is not found in the regulator-motif mapping (envs.regmotifs) file.
- - error: Report error and stop the process.
- - ignore: Ignore the motif and continue.
plot_nvars (type=int) — Number of variants to plot.Plot top <plot_nvars> variants with the largest abs(alleleDiff) (motifBreakR) or smallest p-values (atSNP).
plots (type=json) — Specify the details for the plots.When specified, plot_nvars is ignored. The keys are the variant names and the values are the details for the plots, including: devpars: The device parameters for the plot to override the default (envs.devpars). which: An expression passed to subset(results, subset = ...) to get the motifs for the variant to plot. Or an integer to get the top which motifs. For example, effect == "strong" to get the motifs with strong effect in motifBreakR result.
regmotifs — The path to the regulator-motif mapping file.It must have header and the columns Motif or Model for motif names and TF, Regulator or Transcription factor for regulator names.
regulator_col — The column name in the motif file containing the regulator names.Both motif_col and regulator_col should be the direct column names or the index (1-based) of the columns. If no regulator_col is provided, no regulator information is written in the output. Otherwise, the regulator information is written in the output in the Regulator column.
tool (choice) —
The tool to use for the test.
- - motifbreakr: Use motifBreakR.
- - motifBreakR: Use motifBreakR.
- - atsnp: Use atSNP.
- - atSNP: Use atSNP.
var_col — The column names in the in.motiffile containing the variant information.It has to be matching the names in the in.varfile. This is helpful when we only need to test the pairs of variants and motifs in the in.motiffile.

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

class

`biopipen.ns.regulatory.VariantMotifPlot(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

A plot with a genomic region surrounding a genomic variant, andpotentially disrupted motifs.

Currently only SNVs are supported.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

infile —
File containing the variants and motifs.It is a TAB-delimited file with the following columns:
- - chrom: The chromosome of the SNV. Alias: chr, seqnames.
- - start: The start position of the SNV, no matter 0- or 1-based.
- - end: The end position of the SNV, which will be used as the position of the SNV.
- - strand: Indicating the direction of the surrounding sequence matching the motif.
- - SNP_id: The name of the SNV.
- - REF: The reference allele of the SNV.
- - ALT: The alternative allele of the SNV.
- - providerId: The motif id. It can be specified by envs.motif_col.
- - providerName: The name of the motif provider. Optional.
- - Regulator: The regulator name. Optional, can be specified by envs.regulator_col.
- - motifPos: The position of the motif, relative to the position of the SNV.
  For example, '-8, 4' means the motif is 8 bp upstream and 4 bp downstream of the SNV.

Envs

devpars (ns) —
The default device parameters for the plot.
- - width (type=int): The width of the plot.
- - height (type=int): The height of the plot.
- - res (type=int): The resolution of the plot.
genome — The genome assembly.Used to fetch the sequences around the variants by package, for example, BSgenome.Hsapiens.UCSC.hg19 is required if hg19. If it is an organism other than human, please specify the full name of the package, for example, BSgenome.Mmusculus.UCSC.mm10.
motif_col — The column name in the motif file containing the motif names.If this is not provided, envs.regulator_col and envs.regmotifs are required, which are used to infer the motif names from the regulator names.
motifdb — The path to the motif database. This is required.It should be in the format of MEME motif database. Databases can be downloaded here: https://meme-suite.org/meme/doc/download.html. See also introduction to the databases: https://meme-suite.org/meme/db/motifs. universalmotif is required to read the motif database.
notfound (choice) —
What to do if a motif is not found in the database,or a regulator is not found in the regulator-motif mapping (envs.regmotifs) file.
- - error: Report error and stop the process.
- - ignore: Ignore the motif and continue.
plot_vars (type=auto) — The variants (SNP_id) to plot.A list of variant names to plot or a string with the variant names separated by comma. When not specified, all variants are plotted.
regmotifs — The path to the regulator-motif mapping file.It must have header and the columns Motif or Model for motif names and TF, Regulator or Transcription factor for regulator names.
regulator_col — The column name in the motif file containing the regulator names.Both motif_col and regulator_col should be the direct column names or the index (1-based) of the columns. If no regulator_col is provided, no regulator information is written in the output. Otherwise, the regulator information is written in the output in the Regulator column.

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
init() — Init all other properties and jobs</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Run the process</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__instancecheck__(cls, instance) — Override for isinstance(instance, cls).</>
__repr__(cls) (str) — Representation for the Proc subclasses</>
__subclasscheck__(cls, subclass) — Override for issubclass(subclass, cls).</>
register(cls, subclass) — Register a virtual subclass of an ABC.</>

staticmethod

register(cls, subclass)

</>

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod

__instancecheck__(cls, instance)

</>

Override for isinstance(instance, cls).

staticmethod

__subclasscheck__(cls, subclass)

</>

Override for issubclass(subclass, cls).

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`init()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

method

`run()`

</>

Run the process

biopipen.ns.regulatory

biopipen.ns.regulatory.MotifScan(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

init()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

run()

biopipen.ns.regulatory.MotifAffinityTest(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

init()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

run()

biopipen.ns.regulatory.VariantMotifPlot(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

init()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

run()

`biopipen.ns.regulatory.MotifScan(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

`__init_subclass__()`

`init()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`run()`

`biopipen.ns.regulatory.MotifAffinityTest(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

`__init_subclass__()`

`init()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`run()`

`biopipen.ns.regulatory.VariantMotifPlot(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

`__init_subclass__()`

`init()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`run()`