biopipen.ns.cnvkit

module

biopipen.ns.cnvkit

</>

CNVkit commnads

Classes

CNVkitAccess (Proc) — Calculate the sequence-accessible coordinates in chromosomes from thegiven reference genome using cnvkit.py access </>
CNVkitAutobin (Proc) — Quickly estimate read counts or depths in a BAM file to estimatereasonable on- and (if relevant) off-target bin sizes. </>
CNVkitCoverage (Proc) — Run cnvkit coverage</>
CNVkitReference (Proc) — Run cnvkit reference</>
CNVkitFix (Proc) — Run cnvkit.py fix</>
CNVkitSegment (Proc) — Run cnvkit.py segment</>
CNVkitScatter (Proc) — Run cnvkit.py scatter</>
CNVkitDiagram (Proc) — Run cnvkit.py diagram</>
CNVkitHeatmap (Proc) — Run cnvkit.py heatmap for multiple cases</>
CNVkitCall (Proc) — Run cnvkit.py call</>
CNVkitBatch (Proc) — Run cnvkit batch</>
CNVkitGuessBaits (Proc) — Guess the bait intervals from the bam files</>

class

`biopipen.ns.cnvkit.CNVkitAccess(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Calculate the sequence-accessible coordinates in chromosomes from thegiven reference genome using cnvkit.py access

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
output_flatten —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously.The program entrance for some schedulers may take too much resources when submitting a job or checking the job status. So we may use a smaller number here to limit the simultaneous submissions.
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

excfiles — Additional regions to exclude, in BED format

Output

outfile — The output file

Envs

cnvkit — Path to cnvkit.py
min_gap_size (type=int) — Minimum gap size between accessible sequenceregions
ref — The reference genome fasta file

Requires

cnvkit —
- check: {{proc.envs.cnvkit}} version

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, output_flatten, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Init all other properties and jobs</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__repr__(cls) (str) — Representation for the Proc subclasses</>

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `output_flatten=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
output_flatten (bool | none, optional) —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously.

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`run()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

class

`biopipen.ns.cnvkit.CNVkitAutobin(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Quickly estimate read counts or depths in a BAM file to estimatereasonable on- and (if relevant) off-target bin sizes.

Using cnvkit.py autobin.

If multiple BAMs are given, use the BAM with median file size.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
output_flatten —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously.The program entrance for some schedulers may take too much resources when submitting a job or checking the job status. So we may use a smaller number here to limit the simultaneous submissions.
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

accfile — The access file
baitfile — Potentially targeted genomic regions.E.g. all possible exons for the reference genome. Format - BED, interval list, etc.
bamfiles — The bamfiles

Output

antitarget_file — The antitarget BED output
target_file — The target BED output

Envs

annotate — Use gene models from this file to assign names to the targetregions. Format: UCSC refFlat.txt or ensFlat.txt file (preferred), or BED, interval list, GFF, or similar.
antitarget_max_size (type=int) — Maximum size of antitarget bins.
antitarget_min_size (type=int) — Minimum size of antitarget bins.
bp_per_bin (type=int) — Desired average number of sequencing read basesmapped to each bin.
cnvkit — Path to cnvkit.py
method (choice) —
Sequencing protocol. Determines whether and how to useantitarget bins.
- - hybrid: Hybridization capture
- - amplicon: Targeted amplicon sequencing
- - wgs: Whole genome sequencing
ref — The reference genome fasta file
short_names (flag) — Reduce multi-accession bait labels tobe short and consistent.
target_max_size (type=int) — Maximum size of target bins.
target_min_size (type=int) — Minimum size of target bins.

Requires

cnvkit —
- check: {{proc.envs.cnvkit}} version

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, output_flatten, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Init all other properties and jobs</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__repr__(cls) (str) — Representation for the Proc subclasses</>

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `output_flatten=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
output_flatten (bool | none, optional) —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously.

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`run()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

class

`biopipen.ns.cnvkit.CNVkitCoverage(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit coverage

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
output_flatten —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously.The program entrance for some schedulers may take too much resources when submitting a job or checking the job status. So we may use a smaller number here to limit the simultaneous submissions.
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

bamfile — The bamfile
target_file — The target file or anti-target file

Output

outfile — The output coverage file

Envs

cnvkit — Path to cnvkit.py
count (flag) — Get read depths by counting read midpointswithin each bin. (An alternative algorithm).
min_mapq (type=int) — Minimum mapping quality to include a read.
ncores (type=int) — Number of subprocesses to calculate coveragein parallel
ref — The reference genome fasta file

Requires

cnvkit —
- check: {{proc.envs.cnvkit}} version

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, output_flatten, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Init all other properties and jobs</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__repr__(cls) (str) — Representation for the Proc subclasses</>

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `output_flatten=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
output_flatten (bool | none, optional) —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously.

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`run()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

class

`biopipen.ns.cnvkit.CNVkitReference(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit reference

To genearte a reference file from normal samples, provide the cnn coverage files from the normal samples. To generate a flat reference file, provide the target/antitarget file.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
output_flatten —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously.The program entrance for some schedulers may take too much resources when submitting a job or checking the job status. So we may use a smaller number here to limit the simultaneous submissions.
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

antitarget_file — Antitarget intervals (.bed or .list)
covfiles — The coverage files from normal samples
sample_sex — Specify the chromosomal sex of all given samples as male orfemale. Guess each sample from coverage of X and Y chromosomes if not given.
target_file — Target intervals (.bed or .list)

Output

outfile — The reference cnn file

Envs

cluster (flag) — Calculate and store summary stats forclustered subsets of the normal samples with similar coverage profiles.
cnvkit — Path to cnvkit.py
male_reference (flag) — Create a male reference: shiftfemale samples chrX log-coverage by -1, so the reference chrX average is -1. Otherwise, shift male samples chrX by +1, so the reference chrX average is 0.
min_cluster_size (type=int) — Minimum cluster size to keep in referenceprofiles.
no_edge (flag) — Skip edge-effect correction.
no_gc (flag) — Skip GC correction.
no_rmask (flag) — Skip RepeatMasker correction.
ref — The reference genome fasta file

Requires

cnvkit —
- check: {{proc.envs.cnvkit}} version

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, output_flatten, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Init all other properties and jobs</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__repr__(cls) (str) — Representation for the Proc subclasses</>

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `output_flatten=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
output_flatten (bool | none, optional) —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously.

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`run()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

class

`biopipen.ns.cnvkit.CNVkitFix(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit.py fix

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
output_flatten —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously.The program entrance for some schedulers may take too much resources when submitting a job or checking the job status. So we may use a smaller number here to limit the simultaneous submissions.
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

antitarget_file — The antitarget file
reference — The refence cnn file
sample_id — Sample ID for target/antitarget files.Otherwise inferred from file names.
target_file — The target file

Output

outfile — The fixed coverage files (.cnr)

Envs

cluster (flag) — Compare and use cluster-specific valuespresent in the reference profile. (requires envs.cluster=True for CNVkitReference).
cnvkit — Path to cnvkit.py
no_edge (flag) — Skip edge-effect correction.
no_gc (flag) — Skip GC correction.
no_rmask (flag) — Skip RepeatMasker correction.

Requires

cnvkit —
- check: {{proc.envs.cnvkit}} version

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, output_flatten, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Init all other properties and jobs</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__repr__(cls) (str) — Representation for the Proc subclasses</>

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `output_flatten=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
output_flatten (bool | none, optional) —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously.

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`run()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

class

`biopipen.ns.cnvkit.CNVkitSegment(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit.py segment

For segmentation methods, see https://cnvkit.readthedocs.io/en/stable/pipeline.html#segmentation-methods

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
output_flatten —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously.The program entrance for some schedulers may take too much resources when submitting a job or checking the job status. So we may use a smaller number here to limit the simultaneous submissions.
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

cnrfile — The fixed coverage files (.cnr)
normal_id — Corresponding normal sample ID in the input VCF.This sample is used to select only germline SNVs to plot b-allele frequencies.
sample_id — Specify the name of the sample in the VCF to use for b-allelefrequency extraction and as the default plot title.
vcf — VCF file name containing variants for segmentationby allele frequencies (optional).

Output

outfile — The segmentation file (.cns)

Envs

cnvkit — Path to cnvkit.py
drop_low_coverage (flag) — Drop very-low-coverage binsbefore segmentation to avoid false-positive deletions in poor-quality tumor samples.
drop_outliers (type=int) — Drop outlier bins more than this manymultiples of the 95th quantile away from the average within a rolling window. Set to 0 for no outlier filtering.
method — Method to use for segmentation.Candidates - cbs, flasso, haar, none, hmm, hmm-tumor, hmm-germline
min_variant_depth (type=int) — Minimum read depth for a SNV to bedisplayed in the b-allele frequency plot.
ncores (type=int) — Number of subprocesses to segment in parallel.0 or negative for all available cores
rscript — Path to Rscript
smooth_cbs (flag) — Perform an additional smoothing beforeCBS segmentation, which in some cases may increase the sensitivity. Used only for CBS method.
threshold — Significance threshold (p-value or FDR, depending on method)to accept breakpoints during segmentation. For HMM methods, this is the smoothing window size.
zygosity_freq (type=float) — Ignore VCF's genotypes (GT field) andinstead infer zygosity from allele frequencies.

Requires

cnvkit —
- check: {{proc.envs.cnvkit}} version
r-DNAcopy —
- check: {{proc.envs.rscript}} <(echo "library(DNAcopy)")

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, output_flatten, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Init all other properties and jobs</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__repr__(cls) (str) — Representation for the Proc subclasses</>

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `output_flatten=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
output_flatten (bool | none, optional) —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously.

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`run()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

class

`biopipen.ns.cnvkit.CNVkitScatter(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit.py scatter

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
output_flatten —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously.The program entrance for some schedulers may take too much resources when submitting a job or checking the job status. So we may use a smaller number here to limit the simultaneous submissions.
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

cnrfile — The fixed cnr file (.cnr)
cnsfile — The segmentation file (.cns)
normal_id — Corresponding normal sample ID in the input VCF.This sample is used to select only germline SNVs to plot b-allele frequencies.
sample_id — Specify the name of the sample in the VCF to use for b-allelefrequency extraction and as the default plot title.
vcf — VCF file name containing variants for segmentationby allele frequencies (optional).

Output

outdir — Output directory with plots for multiple cases

Envs

antitarget_marker (flag) — Plot antitargets using thissymbol when plotting in a selected chromosomal region (-g/--gene or -c/--chromosome).
by_bin (flag) — Plot data x-coordinates by bin indicesinstead of genomic coordinates. All bins will be shown with equal width, no blank regions will be shown, and x-axis values indicate bin number (within chromosome) instead of genomic position.
cases (type=json) — The cases for different plots with keys as case namesand values to overwrite the default args given by envs.<args>, including convert_args, by_bin, chromosome, gene, width antitarget_marker, segment_color, trend, y_max, y_min, min_variant_depth, zygosity_freq and title. By default, anall` case will be created with default arguments if no case specified
chromosome — Chromosome or chromosomal range,e.g. 'chr1' or 'chr1:2333000-2444000', to display. If a range is given, all targeted genes in this range will be shown, unless -g/--gene is also given.
cnvkit — Path to cnvkit.py
convert — Path to convert to convert pdf to png file
convert_args (ns) —
The arguments for convert
- - density (type=int): Horizontal and vertical density of the image
- - quality (type=int): JPEG/MIFF/PNG compression level
- - background: Background color
- - alpha: Activate, deactivate, reset, or set the alpha channel
- - : See convert -help and also:
  https://linux.die.net/man/1/convert
gene — Name of gene or genes (comma-separated) to display.
min_variant_depth (type=int) — Minimum read depth for a SNV to bedisplayed in the b-allele frequency plot.
segment_color — Plot segment lines in this color. Value can beany string accepted by matplotlib, e.g. 'red' or '#CC0000'.
title — Plot title. Sample ID if not provided.
trend (flag) — Draw a smoothed local trendline on thescatter plot.
width (type=int) — Width of margin to show around the selected gene(s)(-g/--gene) or small chromosomal region (-c/--chromosome).
y_max (type=int) — y-axis upper limit.
y_min (tyoe=int) — y-axis lower limit.
zygosity_freq (typ=float) — Ignore VCF's genotypes (GT field) andinstead infer zygosity from allele frequencies.

Requires

cnvkit —
- check: {{proc.envs.cnvkit}} version
convert —
- check: {{proc.envs.convert}} -version

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, output_flatten, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Init all other properties and jobs</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__repr__(cls) (str) — Representation for the Proc subclasses</>

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `output_flatten=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
output_flatten (bool | none, optional) —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously.

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`run()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

class

`biopipen.ns.cnvkit.CNVkitDiagram(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit.py diagram

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
output_flatten —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously.The program entrance for some schedulers may take too much resources when submitting a job or checking the job status. So we may use a smaller number here to limit the simultaneous submissions.
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

cnrfile — The fixed cnr file (.cnr)
cnsfile — The segmentation file (.cns)
sample_sex — Specify the sample's chromosomal sex as male or female.(Otherwise guessed from X and Y coverage).

Output

outdir — Output directory with the scatter plots

Envs

cases (type=json) — The cases with keys as names and values as differentconfigs, including threshold, min_probes, male_reference, no_shift_xy and title
cnvkit — Path to cnvkit.py
convert — Path to convert to convert pdf to png file
convert_args (ns) —
The arguments for convert
- - density (type=int): Horizontal and vertical density of the image
- - quality (type=int): JPEG/MIFF/PNG compression level
- - background: Background color
- - alpha: Activate, deactivate, reset, or set the alpha channel
- - : See convert -help and also:
  https://linux.die.net/man/1/convert
male_reference (flag) — Assume inputs were normalized to amale reference (i.e. female samples will have +1 log-CNR of chrX; otherwise male samples would have -1 chrX).
min_probes (type=int) — Minimum number of covered probes to label a gene.
no_shift_xy (flag) — Don't adjust the X and Y chromosomesaccording to sample sex.
threshold (type=float) — Copy number change threshold to label genes.
title — Plot title. Sample ID if not provided.

Requires

cnvkit —
- check: {{proc.envs.cnvkit}} version
convert —
- check: {{proc.envs.convert}} -version

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, output_flatten, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Init all other properties and jobs</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__repr__(cls) (str) — Representation for the Proc subclasses</>

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `output_flatten=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
output_flatten (bool | none, optional) —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously.

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`run()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

class

`biopipen.ns.cnvkit.CNVkitHeatmap(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit.py heatmap for multiple cases

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
output_flatten —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously.The program entrance for some schedulers may take too much resources when submitting a job or checking the job status. So we may use a smaller number here to limit the simultaneous submissions.
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

sample_sex — Specify the chromosomal sex of all given samples as maleor female. Separated by comma. (Default: guess each sample from coverage of X and Y chromosomes).
segfiles — Sample coverages as raw probes (.cnr) or segments (.cns).

Output

outdir — Output directory with heatmaps of multiple cases

Envs

by_bin (flag) — Plot data x-coordinates by bin indicesinstead of genomic coordinates. All bins will be shown with equal width, no blank regions will be shown, and x-axis values indicate bin number (within chromosome) instead of genomic position.
cases (type=json) — The cases for different plots with keys as case namesand values to overwrite the default args given by envs.<args>, including convert_args, by_bin, chromosome, desaturate, male_reference, and, no_shift_xy. By default, an all case will be created with default arguments if no case specified
chromosome — Chromosome (e.g. 'chr1') or chromosomal range(e.g. 'chr1:2333000-2444000') to display.
cnvkit — Path to cnvkit.py
convert — Path to convert to convert pdf to png file
convert_args (ns) —
The arguments for convert
- - density (type=int): Horizontal and vertical density of the image
- - quality (type=int): JPEG/MIFF/PNG compression level
- - background: Background color
- - alpha: Activate, deactivate, reset, or set the alpha channel
- - : See convert -help and also:
  https://linux.die.net/man/1/convert
desaturate (flag) — Tweak color saturation to focus onsignificant changes.
male_reference (flag) — Assume inputs were normalized toa male reference. (i.e. female samples will have +1 log-CNR of chrX; otherwise male samples would have -1 chrX).
no_shift_xy (flag) — Don't adjust the X and Y chromosomesaccording to sample sex.
order — A file with sample names in the desired order.

Requires

cnvkit —
- check: {{proc.envs.cnvkit}} version
convert —
- check: {{proc.envs.convert}} -version

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, output_flatten, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Init all other properties and jobs</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__repr__(cls) (str) — Representation for the Proc subclasses</>

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `output_flatten=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
output_flatten (bool | none, optional) —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously.

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`run()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

class

`biopipen.ns.cnvkit.CNVkitCall(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit.py call

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
output_flatten —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously.The program entrance for some schedulers may take too much resources when submitting a job or checking the job status. So we may use a smaller number here to limit the simultaneous submissions.
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

cnrfile — The fixed cnr file (.cnr), used to generate VCF file
cnsfile — The segmentation file (.cns)
normal_id — Corresponding normal sample ID in the input VCF.This sample is used to select only germline SNVs to plot b-allele frequencies.
purity — Estimated tumor cell fraction, a.k.a. purity or cellularity.
sample_id — Specify the name of the sample in the VCF to use for b-allelefrequency extraction and as the default plot title.
sample_sex — Specify the sample's chromosomal sex as male or female.(Otherwise guessed from X and Y coverage).
vcf — VCF file name containing variants for segmentationby allele frequencies (optional).

Output

outdir — The output directory including the call file (.call.cns)bed file, and the vcf file

Envs

center — Re-center the log2 ratio values using this estimator ofthe center or average value.
center_at (type=float) — Subtract a constant number from all log2 ratios.For "manual" re-centering, in case the --center option gives unsatisfactory results.)
cnvkit — Path to cnvkit.py
drop_low_coverage (flag) — Drop very-low-coverage binsbefore segmentation to avoid false-positive deletions in poor-quality tumor samples.
filter — Merge segments flagged by the specifiedfilter(s) with the adjacent segment(s).
male_reference (flag) — Assume inputs were normalized to amale reference. (i.e. female samples will have +1 log-CNR of chrX; otherwise male samples would have -1 chrX).
method (choice) —
Calling method (threshold, clonal or none).
- - threshold: Using hard thresholds for calling each integer copy
  number.
  Use thresholds to set a list of threshold log2 values for
  each copy number state
- - clonal: Rescaling and rounding.
  For a given known tumor cell fraction and normal ploidy,
  then simple rounding to the nearest integer copy number
- - none: Do not add a “cn” column or allele copy numbers.
  But still performs rescaling, re-centering, and extracting
  b-allele frequencies from a VCF (if requested).
min_variant_depth (type=int) — Minimum read depth for a SNV to bedisplayed in the b-allele frequency plot.
ploidy (type=float) — Ploidy of the sample cells.
thresholds — Hard thresholds for calling each integer copy number,separated by commas.
zygosity_freq (type=float) — Ignore VCF's genotypes (GT field) andinstead infer zygosity from allele frequencies.

Requires

cnvkit —
- check: {{proc.envs.cnvkit}} version

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, output_flatten, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Init all other properties and jobs</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__repr__(cls) (str) — Representation for the Proc subclasses</>

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `output_flatten=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
output_flatten (bool | none, optional) —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously.

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`run()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

class

`biopipen.ns.cnvkit.CNVkitBatch(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit batch

If you need in-depth control of the parameters, for example, multiple scatter plots in different regions, or you need to specify sample-sex for different samples, take a look at biopipen.ns.cnvkit_pipeline

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
output_flatten —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously.The program entrance for some schedulers may take too much resources when submitting a job or checking the job status. So we may use a smaller number here to limit the simultaneous submissions.
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

metafile — The meta data file containing the sample informationTwo columns BamFile and envs.type_col are required. The tumor samples should be labeled as envs.type_tumor and the normal samples should be labeled as envs.type_normal in the envs.type_col column. If normal samples are not found, a flat reference will be used. The could be other columns in the meta file, but they could be used in biopipen.ns.cnvkit_pipeline.

Output

outdir — The output directory

Envs

access — Regions of accessible sequence on chromosomes (.bed),as output by the 'access' command.
access_excludes — Exclude these regions from the accessible genomeUsed when envs.access is not specified.
access_min_gap_size — Minimum gap size between accessiblesequence regions if envs.access is not specified.
annotate — Use gene models from this file to assign names to thetarget regions. Format: UCSC refFlat.txt or ensFlat.txt file (preferred), or BED, interval list, GFF, or similar.
antitarget_avg_size — Average size of antitarget bins(results are approximate).
antitarget_min_size — Minimum size of antitarget bins(smaller regions are dropped).
antitargets — Anti-target intervals (.bed or .list) (optional for wgs)
cluster — Calculate and use cluster-specific summary stats in thereference pool to normalize samples.
cnvkit — Path to cnvkit.py
count_reads — Get read depths by counting read midpoints within each bin.(An alternative algorithm).
diagram — Create an ideogram of copy ratios on chromosomes as a PDF.
drop_low_coverage — Drop very-low-coverage bins before segmentation toavoid false-positive deletions in poor-quality tumor samples.
male_reference — Use or assume a male reference (i.e. female sampleswill have +1 log-CNR of chrX; otherwise male samples would have -1 chrX).
method — Sequencing assay type: hybridization capture ('hybrid'),targeted amplicon sequencing ('amplicon'), or whole genome sequencing ('wgs'). Determines whether and how to use antitarget bins.
ncores — Number of subprocesses used to running each of the BAM filesin parallel
ref — Path to a FASTA file containing the reference genome.
reference — Copy number reference file (.cnn) to reuse
rscript — Path to the Rscript excecutable to use for running R code.Use this option to specify a non-default R installation.
scatter — Create a whole-genome copy ratio profile as a PDF scatter plot.
segment_method — cbs,flasso,haar,none,hmm,hmm-tumor,hmm-germlineMethod used in the 'segment' step.
short_names — Reduce multi-accession bait labels to be shortand consistent.
target_avg_size — Average size of split target bins(results are approximate).
targets — Target intervals (.bed or .list) (optional for wgs)
type_col — type_col: The column name in the metafile thatindicates the sample type.
type_normal — The type of normal samples in envs.type_col column ofin.metafile
type_tumor — The type of tumor samples in envs.type_col column ofin.metafile

Requires

cnvkit —
- check: {{proc.envs.cnvkit}} version
r-DNAcopy —
- check: {{proc.envs.rscript}} <(echo "library(DNAcopy)")

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, output_flatten, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Init all other properties and jobs</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__repr__(cls) (str) — Representation for the Proc subclasses</>

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `output_flatten=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
output_flatten (bool | none, optional) —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously.

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`run()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

class

`biopipen.ns.cnvkit.CNVkitGuessBaits(*args`, `**kwds)` → Proc

</>

Bases

biopipen.core.proc.Proc pipen.proc.Proc

Guess the bait intervals from the bam files

It runs scripts/guess_baits.py from the cnvkit repo.

Attributes

cache — Should we detect whether the jobs are cached?
desc — The description of the process. Will use the summary fromthe docstring by default.
dirsig — When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
envs — The arguments that are job-independent, useful for common optionsacross jobs.
envs_depth — How deep to update the envs when subclassed.
error_strategy —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export — When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
forks — How many jobs to run simultaneously?
input — The keys for the input channel
input_data — The input data (will be computed for dependent processes)
lang — The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
name — The name of the process. Will use the class name by default.
nexts — Computed from requires to build the process relationships
num_retries — How many times to retry to jobs once error occurs
order — The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
output — The output keys for the output channel(the data will be computed)
output_data — The output data (to pass to the next processes)
output_flatten —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
plugin_opts — Options for process-level plugins
requires — The dependency processes
scheduler — The scheduler to run the jobs
scheduler_opts — The options for the scheduler
script — The script template for the process
submission_batch — How many jobs to be submited simultaneously.The program entrance for some schedulers may take too much resources when submitting a job or checking the job status. So we may use a smaller number here to limit the simultaneous submissions.
template — Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.

Input

atfile — The potential target file or access filee.g. all known exons in the reference genome or from cnvkit.py access
bamfiles — The bam files

Output

targetfile — The target file

Envs

cnvkit — Path to cnvkit.py
guided (flag) — in.atfile is a potential target file whenTrue, otherwise it is an access file.
min_depth (type=int) — Minimum sequencing read depth to accept ascaptured. For guided only.
min_gap (type=int) — Merge regions separated by gaps smaller than this.
min_length (type=int) — Minimum region length to accept as captured.min_gap and min_length are for unguided only.
ncores (type=int) — Number of subprocesses to segment in parallel0 to use the maximum number of available CPUs.
ref — Path to a FASTA file containing the reference genome.
samtools — Path to samtools executable

Classes

ProcMeta — Meta class for Proc</>

Methods

__init_subclass__() — Do the requirements inferring since we need them to build up theprocess relationship </>
from_proc(proc, name, desc, envs, envs_depth, cache, export, output_flatten, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>
gc() — GC process for the process to save memory after it's done</>
log(level, msg, *args, logger) — Log message for the process</>
run() — Init all other properties and jobs</>

class

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

</>

Bases

abc.ABCMeta

Meta class for Proc

Methods

__call__(cls, *args, **kwds) (Proc) — Make sure Proc subclasses are singletons</>
__repr__(cls) (str) — Representation for the Proc subclasses</>

staticmethod

__repr__(cls) → str

</>

Representation for the Proc subclasses

staticmethod

__call__(cls, *args, **kwds)

</>

Make sure Proc subclasses are singletons

Parameters

*args (Any) — and
**kwds (Any) — Arguments for the constructor

Returns (Proc)

The Proc instance

classmethod

`from_proc(proc`, `name=None`, `desc=None`, `envs=None`, `envs_depth=None`, `cache=None`, `export=None`, `output_flatten=None`, `error_strategy=None`, `num_retries=None`, `forks=None`, `input_data=None`, `order=None`, `plugin_opts=None`, `requires=None`, `scheduler=None`, `scheduler_opts=None`, `submission_batch=None)`

</>

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters

proc (Type) — The Proc subclass
name (str, optional) — The new name of the process
desc (str, optional) — The new description of the process
envs (Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
envs_depth (int, optional) — How deep to update the envs when subclassed.
cache (bool, optional) — Whether we should check the cache for the jobs
export (bool, optional) — When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
output_flatten (bool | none, optional) —
Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g. <outdir>/0, <outdir>/1, etc.). If output_flatten is True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values
- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
error_strategy (str, optional) —
How to deal with the errors
- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries (int, optional) — How many times to retry to jobs once error occurs
forks (int, optional) — New forks for the new process
input_data (Any, optional) — The input data for the process. Only when this processis a start process
order (int, optional) — The order to execute the new process
plugin_opts (Mapping, optional) — The new plugin options, unspecified items will beinherited.
requires (Sequence, optional) — The required processes for the new process
scheduler (str, optional) — The new shedular to run the new process
scheduler_opts (Mapping, optional) — The new scheduler options, unspecified items willbe inherited.
submission_batch (int, optional) — How many jobs to be submited simultaneously.

Returns (Type)

The new process class

classmethod

`__init_subclass__()`

</>

Do the requirements inferring since we need them to build up theprocess relationship

method

`run()`

</>

Init all other properties and jobs

method

`gc()`

</>

GC process for the process to save memory after it's done

method

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

</>

Log message for the process

Parameters

level (int | str) — The log level of the record
msg (str) — The message to log
*args — The arguments to format the message
logger (LoggerAdapter, optional) — The logging logger

biopipen.ns.cnvkit

biopipen.ns.cnvkit.CNVkitAccess(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, output_flatten=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

run()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

biopipen.ns.cnvkit.CNVkitAutobin(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, output_flatten=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

run()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

biopipen.ns.cnvkit.CNVkitCoverage(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, output_flatten=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

run()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

biopipen.ns.cnvkit.CNVkitReference(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, output_flatten=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

run()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

biopipen.ns.cnvkit.CNVkitFix(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, output_flatten=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

run()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

biopipen.ns.cnvkit.CNVkitSegment(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, output_flatten=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

run()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

biopipen.ns.cnvkit.CNVkitScatter(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, output_flatten=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

run()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

biopipen.ns.cnvkit.CNVkitDiagram(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, output_flatten=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

run()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

biopipen.ns.cnvkit.CNVkitHeatmap(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, output_flatten=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

run()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

biopipen.ns.cnvkit.CNVkitCall(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, output_flatten=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

run()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

biopipen.ns.cnvkit.CNVkitBatch(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, output_flatten=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

__init_subclass__()

run()

gc()

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

biopipen.ns.cnvkit.CNVkitGuessBaits(*args, **kwds) → Proc

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

`biopipen.ns.cnvkit.CNVkitAccess(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`__init_subclass__()`

`run()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`biopipen.ns.cnvkit.CNVkitAutobin(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`__init_subclass__()`

`run()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`biopipen.ns.cnvkit.CNVkitCoverage(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`__init_subclass__()`

`run()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`biopipen.ns.cnvkit.CNVkitReference(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`__init_subclass__()`

`run()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`biopipen.ns.cnvkit.CNVkitFix(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`__init_subclass__()`

`run()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`biopipen.ns.cnvkit.CNVkitSegment(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`__init_subclass__()`

`run()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`biopipen.ns.cnvkit.CNVkitScatter(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`__init_subclass__()`

`run()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`biopipen.ns.cnvkit.CNVkitDiagram(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`__init_subclass__()`

`run()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`biopipen.ns.cnvkit.CNVkitHeatmap(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`__init_subclass__()`

`run()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`biopipen.ns.cnvkit.CNVkitCall(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`__init_subclass__()`

`run()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`biopipen.ns.cnvkit.CNVkitBatch(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`__init_subclass__()`

`run()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`

`biopipen.ns.cnvkit.CNVkitGuessBaits(*args`, `**kwds)` → Proc

`pipen.proc.ProcMeta(name`, `bases`, `namespace`, `**kwargs)`

`__init_subclass__()`

`run()`

`gc()`

`log(level`, `msg`, `*args`, `logger=<LoggerAdapter pipen.core (WARNING)>)`