module

biopipen.ns.cnvkit

CNVkit commnads

Classes
class

biopipen.ns.cnvkit.CNVkitAccess(*args, **kwds)Proc

Bases
biopipen.core.proc.Proc pipen.proc.Proc

Calculate the sequence-accessible coordinates in chromosomes from thegiven reference genome using cnvkit.py access

Attributes
  • cache Should we detect whether the jobs are cached?
  • desc The description of the process. Will use the summary fromthe docstring by default.
  • dirsig When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
  • envs The arguments that are job-independent, useful for common optionsacross jobs.
  • envs_depth How deep to update the envs when subclassed.
  • error_strategy How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • export When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • forks How many jobs to run simultaneously?
  • input The keys for the input channel
  • input_data The input data (will be computed for dependent processes)
  • lang The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
  • name The name of the process. Will use the class name by default.
  • nexts Computed from requires to build the process relationships
  • num_retries How many times to retry to jobs once error occurs
  • order The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
  • output The output keys for the output channel(the data will be computed)
  • output_data The output data (to pass to the next processes)
  • plugin_opts Options for process-level plugins
  • requires The dependency processes
  • scheduler The scheduler to run the jobs
  • scheduler_opts The options for the scheduler
  • script The script template for the process
  • submission_batch How many jobs to be submited simultaneously
  • template Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.
Input
  • excfiles Additional regions to exclude, in BED format
Output
  • outfile The output file
Envs
  • cnvkit Path to cnvkit.py
  • min_gap_size (type=int) Minimum gap size between accessible sequenceregions
  • ref The reference genome fasta file
Requires
  • cnvkit
    • check: {{proc.envs.cnvkit}} version
Classes
Methods
  • __init_subclass__() Do the requirements inferring since we need them to build up theprocess relationship </>
  • from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) Create a subclass of Proc using another Proc subclass or Proc itself</>
  • gc() GC process for the process to save memory after it's done</>
  • init() Init all other properties and jobs</>
  • log(level, msg, *args, logger) Log message for the process</>
  • run() Run the process</>
class

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

Bases
abc.ABCMeta

Meta class for Proc

Methods
  • __call__(cls, *args, **kwds) (Proc) Make sure Proc subclasses are singletons</>
  • __instancecheck__(cls, instance) Override for isinstance(instance, cls).</>
  • __repr__(cls) (str) Representation for the Proc subclasses</>
  • __subclasscheck__(cls, subclass) Override for issubclass(subclass, cls).</>
  • register(cls, subclass) Register a virtual subclass of an ABC.</>
staticmethod
register(cls, subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod
__instancecheck__(cls, instance)

Override for isinstance(instance, cls).

staticmethod
__subclasscheck__(cls, subclass)

Override for issubclass(subclass, cls).

staticmethod
__repr__(cls) → str

Representation for the Proc subclasses

staticmethod
__call__(cls, *args, **kwds)

Make sure Proc subclasses are singletons

Parameters
  • *args (Any) and
  • **kwds (Any) Arguments for the constructor
Returns (Proc)

The Proc instance

classmethod

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters
  • proc (Type) The Proc subclass
  • name (str, optional) The new name of the process
  • desc (str, optional) The new description of the process
  • envs (Mapping, optional) The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
  • envs_depth (int, optional) How deep to update the envs when subclassed.
  • cache (bool, optional) Whether we should check the cache for the jobs
  • export (bool, optional) When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • error_strategy (str, optional) How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • num_retries (int, optional) How many times to retry to jobs once error occurs
  • forks (int, optional) New forks for the new process
  • input_data (Any, optional) The input data for the process. Only when this processis a start process
  • order (int, optional) The order to execute the new process
  • plugin_opts (Mapping, optional) The new plugin options, unspecified items will beinherited.
  • requires (Sequence, optional) The required processes for the new process
  • scheduler (str, optional) The new shedular to run the new process
  • scheduler_opts (Mapping, optional) The new scheduler options, unspecified items willbe inherited.
  • submission_batch (int, optional) How many jobs to be submited simultaneously
Returns (Type)

The new process class

classmethod

__init_subclass__()

Do the requirements inferring since we need them to build up theprocess relationship

method

init()

Init all other properties and jobs

method

gc()

GC process for the process to save memory after it's done

method

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

Log message for the process

Parameters
  • level (int | str) The log level of the record
  • msg (str) The message to log
  • *args The arguments to format the message
  • logger (LoggerAdapter, optional) The logging logger
method

run()

Run the process

class

biopipen.ns.cnvkit.CNVkitAutobin(*args, **kwds)Proc

Bases
biopipen.core.proc.Proc pipen.proc.Proc

Quickly estimate read counts or depths in a BAM file to estimatereasonable on- and (if relevant) off-target bin sizes.

Using cnvkit.py autobin.

If multiple BAMs are given, use the BAM with median file size.

Attributes
  • cache Should we detect whether the jobs are cached?
  • desc The description of the process. Will use the summary fromthe docstring by default.
  • dirsig When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
  • envs The arguments that are job-independent, useful for common optionsacross jobs.
  • envs_depth How deep to update the envs when subclassed.
  • error_strategy How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • export When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • forks How many jobs to run simultaneously?
  • input The keys for the input channel
  • input_data The input data (will be computed for dependent processes)
  • lang The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
  • name The name of the process. Will use the class name by default.
  • nexts Computed from requires to build the process relationships
  • num_retries How many times to retry to jobs once error occurs
  • order The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
  • output The output keys for the output channel(the data will be computed)
  • output_data The output data (to pass to the next processes)
  • plugin_opts Options for process-level plugins
  • requires The dependency processes
  • scheduler The scheduler to run the jobs
  • scheduler_opts The options for the scheduler
  • script The script template for the process
  • submission_batch How many jobs to be submited simultaneously
  • template Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.
Input
  • accfile The access file
  • baitfile Potentially targeted genomic regions.E.g. all possible exons for the reference genome. Format - BED, interval list, etc.
  • bamfiles The bamfiles
Output
  • antitarget_file The antitarget BED output
  • target_file The target BED output
Envs
  • annotate Use gene models from this file to assign names to the targetregions. Format: UCSC refFlat.txt or ensFlat.txt file (preferred), or BED, interval list, GFF, or similar.
  • antitarget_max_size (type=int) Maximum size of antitarget bins.
  • antitarget_min_size (type=int) Minimum size of antitarget bins.
  • bp_per_bin (type=int) Desired average number of sequencing read basesmapped to each bin.
  • cnvkit Path to cnvkit.py
  • method (choice) Sequencing protocol. Determines whether and how to useantitarget bins.
    • - hybrid: Hybridization capture
    • - amplicon: Targeted amplicon sequencing
    • - wgs: Whole genome sequencing
  • ref The reference genome fasta file
  • short_names (flag) Reduce multi-accession bait labels tobe short and consistent.
  • target_max_size (type=int) Maximum size of target bins.
  • target_min_size (type=int) Minimum size of target bins.
Requires
  • cnvkit
    • check: {{proc.envs.cnvkit}} version
Classes
Methods
  • __init_subclass__() Do the requirements inferring since we need them to build up theprocess relationship </>
  • from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) Create a subclass of Proc using another Proc subclass or Proc itself</>
  • gc() GC process for the process to save memory after it's done</>
  • init() Init all other properties and jobs</>
  • log(level, msg, *args, logger) Log message for the process</>
  • run() Run the process</>
class

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

Bases
abc.ABCMeta

Meta class for Proc

Methods
  • __call__(cls, *args, **kwds) (Proc) Make sure Proc subclasses are singletons</>
  • __instancecheck__(cls, instance) Override for isinstance(instance, cls).</>
  • __repr__(cls) (str) Representation for the Proc subclasses</>
  • __subclasscheck__(cls, subclass) Override for issubclass(subclass, cls).</>
  • register(cls, subclass) Register a virtual subclass of an ABC.</>
staticmethod
register(cls, subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod
__instancecheck__(cls, instance)

Override for isinstance(instance, cls).

staticmethod
__subclasscheck__(cls, subclass)

Override for issubclass(subclass, cls).

staticmethod
__repr__(cls) → str

Representation for the Proc subclasses

staticmethod
__call__(cls, *args, **kwds)

Make sure Proc subclasses are singletons

Parameters
  • *args (Any) and
  • **kwds (Any) Arguments for the constructor
Returns (Proc)

The Proc instance

classmethod

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters
  • proc (Type) The Proc subclass
  • name (str, optional) The new name of the process
  • desc (str, optional) The new description of the process
  • envs (Mapping, optional) The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
  • envs_depth (int, optional) How deep to update the envs when subclassed.
  • cache (bool, optional) Whether we should check the cache for the jobs
  • export (bool, optional) When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • error_strategy (str, optional) How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • num_retries (int, optional) How many times to retry to jobs once error occurs
  • forks (int, optional) New forks for the new process
  • input_data (Any, optional) The input data for the process. Only when this processis a start process
  • order (int, optional) The order to execute the new process
  • plugin_opts (Mapping, optional) The new plugin options, unspecified items will beinherited.
  • requires (Sequence, optional) The required processes for the new process
  • scheduler (str, optional) The new shedular to run the new process
  • scheduler_opts (Mapping, optional) The new scheduler options, unspecified items willbe inherited.
  • submission_batch (int, optional) How many jobs to be submited simultaneously
Returns (Type)

The new process class

classmethod

__init_subclass__()

Do the requirements inferring since we need them to build up theprocess relationship

method

init()

Init all other properties and jobs

method

gc()

GC process for the process to save memory after it's done

method

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

Log message for the process

Parameters
  • level (int | str) The log level of the record
  • msg (str) The message to log
  • *args The arguments to format the message
  • logger (LoggerAdapter, optional) The logging logger
method

run()

Run the process

class

biopipen.ns.cnvkit.CNVkitCoverage(*args, **kwds)Proc

Bases
biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit coverage

Attributes
  • cache Should we detect whether the jobs are cached?
  • desc The description of the process. Will use the summary fromthe docstring by default.
  • dirsig When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
  • envs The arguments that are job-independent, useful for common optionsacross jobs.
  • envs_depth How deep to update the envs when subclassed.
  • error_strategy How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • export When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • forks How many jobs to run simultaneously?
  • input The keys for the input channel
  • input_data The input data (will be computed for dependent processes)
  • lang The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
  • name The name of the process. Will use the class name by default.
  • nexts Computed from requires to build the process relationships
  • num_retries How many times to retry to jobs once error occurs
  • order The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
  • output The output keys for the output channel(the data will be computed)
  • output_data The output data (to pass to the next processes)
  • plugin_opts Options for process-level plugins
  • requires The dependency processes
  • scheduler The scheduler to run the jobs
  • scheduler_opts The options for the scheduler
  • script The script template for the process
  • submission_batch How many jobs to be submited simultaneously
  • template Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.
Input
  • bamfile The bamfile
  • target_file The target file or anti-target file
Output
  • outfile The output coverage file
Envs
  • cnvkit Path to cnvkit.py
  • count (flag) Get read depths by counting read midpointswithin each bin. (An alternative algorithm).
  • min_mapq (type=int) Minimum mapping quality to include a read.
  • ncores (type=int) Number of subprocesses to calculate coveragein parallel
  • ref The reference genome fasta file
Requires
  • cnvkit
    • check: {{proc.envs.cnvkit}} version
Classes
Methods
  • __init_subclass__() Do the requirements inferring since we need them to build up theprocess relationship </>
  • from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) Create a subclass of Proc using another Proc subclass or Proc itself</>
  • gc() GC process for the process to save memory after it's done</>
  • init() Init all other properties and jobs</>
  • log(level, msg, *args, logger) Log message for the process</>
  • run() Run the process</>
class

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

Bases
abc.ABCMeta

Meta class for Proc

Methods
  • __call__(cls, *args, **kwds) (Proc) Make sure Proc subclasses are singletons</>
  • __instancecheck__(cls, instance) Override for isinstance(instance, cls).</>
  • __repr__(cls) (str) Representation for the Proc subclasses</>
  • __subclasscheck__(cls, subclass) Override for issubclass(subclass, cls).</>
  • register(cls, subclass) Register a virtual subclass of an ABC.</>
staticmethod
register(cls, subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod
__instancecheck__(cls, instance)

Override for isinstance(instance, cls).

staticmethod
__subclasscheck__(cls, subclass)

Override for issubclass(subclass, cls).

staticmethod
__repr__(cls) → str

Representation for the Proc subclasses

staticmethod
__call__(cls, *args, **kwds)

Make sure Proc subclasses are singletons

Parameters
  • *args (Any) and
  • **kwds (Any) Arguments for the constructor
Returns (Proc)

The Proc instance

classmethod

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters
  • proc (Type) The Proc subclass
  • name (str, optional) The new name of the process
  • desc (str, optional) The new description of the process
  • envs (Mapping, optional) The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
  • envs_depth (int, optional) How deep to update the envs when subclassed.
  • cache (bool, optional) Whether we should check the cache for the jobs
  • export (bool, optional) When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • error_strategy (str, optional) How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • num_retries (int, optional) How many times to retry to jobs once error occurs
  • forks (int, optional) New forks for the new process
  • input_data (Any, optional) The input data for the process. Only when this processis a start process
  • order (int, optional) The order to execute the new process
  • plugin_opts (Mapping, optional) The new plugin options, unspecified items will beinherited.
  • requires (Sequence, optional) The required processes for the new process
  • scheduler (str, optional) The new shedular to run the new process
  • scheduler_opts (Mapping, optional) The new scheduler options, unspecified items willbe inherited.
  • submission_batch (int, optional) How many jobs to be submited simultaneously
Returns (Type)

The new process class

classmethod

__init_subclass__()

Do the requirements inferring since we need them to build up theprocess relationship

method

init()

Init all other properties and jobs

method

gc()

GC process for the process to save memory after it's done

method

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

Log message for the process

Parameters
  • level (int | str) The log level of the record
  • msg (str) The message to log
  • *args The arguments to format the message
  • logger (LoggerAdapter, optional) The logging logger
method

run()

Run the process

class

biopipen.ns.cnvkit.CNVkitReference(*args, **kwds)Proc

Bases
biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit reference

To genearte a reference file from normal samples, provide the cnn coverage files from the normal samples. To generate a flat reference file, provide the target/antitarget file.

Attributes
  • cache Should we detect whether the jobs are cached?
  • desc The description of the process. Will use the summary fromthe docstring by default.
  • dirsig When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
  • envs The arguments that are job-independent, useful for common optionsacross jobs.
  • envs_depth How deep to update the envs when subclassed.
  • error_strategy How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • export When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • forks How many jobs to run simultaneously?
  • input The keys for the input channel
  • input_data The input data (will be computed for dependent processes)
  • lang The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
  • name The name of the process. Will use the class name by default.
  • nexts Computed from requires to build the process relationships
  • num_retries How many times to retry to jobs once error occurs
  • order The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
  • output The output keys for the output channel(the data will be computed)
  • output_data The output data (to pass to the next processes)
  • plugin_opts Options for process-level plugins
  • requires The dependency processes
  • scheduler The scheduler to run the jobs
  • scheduler_opts The options for the scheduler
  • script The script template for the process
  • submission_batch How many jobs to be submited simultaneously
  • template Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.
Input
  • antitarget_file Antitarget intervals (.bed or .list)
  • covfiles The coverage files from normal samples
  • sample_sex Specify the chromosomal sex of all given samples as male orfemale. Guess each sample from coverage of X and Y chromosomes if not given.
  • target_file Target intervals (.bed or .list)
Output
  • outfile The reference cnn file
Envs
  • cluster (flag) Calculate and store summary stats forclustered subsets of the normal samples with similar coverage profiles.
  • cnvkit Path to cnvkit.py
  • male_reference (flag) Create a male reference: shiftfemale samples chrX log-coverage by -1, so the reference chrX average is -1. Otherwise, shift male samples chrX by +1, so the reference chrX average is 0.
  • min_cluster_size (type=int) Minimum cluster size to keep in referenceprofiles.
  • no_edge (flag) Skip edge-effect correction.
  • no_gc (flag) Skip GC correction.
  • no_rmask (flag) Skip RepeatMasker correction.
  • ref The reference genome fasta file
Requires
  • cnvkit
    • check: {{proc.envs.cnvkit}} version
Classes
Methods
  • __init_subclass__() Do the requirements inferring since we need them to build up theprocess relationship </>
  • from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) Create a subclass of Proc using another Proc subclass or Proc itself</>
  • gc() GC process for the process to save memory after it's done</>
  • init() Init all other properties and jobs</>
  • log(level, msg, *args, logger) Log message for the process</>
  • run() Run the process</>
class

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

Bases
abc.ABCMeta

Meta class for Proc

Methods
  • __call__(cls, *args, **kwds) (Proc) Make sure Proc subclasses are singletons</>
  • __instancecheck__(cls, instance) Override for isinstance(instance, cls).</>
  • __repr__(cls) (str) Representation for the Proc subclasses</>
  • __subclasscheck__(cls, subclass) Override for issubclass(subclass, cls).</>
  • register(cls, subclass) Register a virtual subclass of an ABC.</>
staticmethod
register(cls, subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod
__instancecheck__(cls, instance)

Override for isinstance(instance, cls).

staticmethod
__subclasscheck__(cls, subclass)

Override for issubclass(subclass, cls).

staticmethod
__repr__(cls) → str

Representation for the Proc subclasses

staticmethod
__call__(cls, *args, **kwds)

Make sure Proc subclasses are singletons

Parameters
  • *args (Any) and
  • **kwds (Any) Arguments for the constructor
Returns (Proc)

The Proc instance

classmethod

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters
  • proc (Type) The Proc subclass
  • name (str, optional) The new name of the process
  • desc (str, optional) The new description of the process
  • envs (Mapping, optional) The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
  • envs_depth (int, optional) How deep to update the envs when subclassed.
  • cache (bool, optional) Whether we should check the cache for the jobs
  • export (bool, optional) When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • error_strategy (str, optional) How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • num_retries (int, optional) How many times to retry to jobs once error occurs
  • forks (int, optional) New forks for the new process
  • input_data (Any, optional) The input data for the process. Only when this processis a start process
  • order (int, optional) The order to execute the new process
  • plugin_opts (Mapping, optional) The new plugin options, unspecified items will beinherited.
  • requires (Sequence, optional) The required processes for the new process
  • scheduler (str, optional) The new shedular to run the new process
  • scheduler_opts (Mapping, optional) The new scheduler options, unspecified items willbe inherited.
  • submission_batch (int, optional) How many jobs to be submited simultaneously
Returns (Type)

The new process class

classmethod

__init_subclass__()

Do the requirements inferring since we need them to build up theprocess relationship

method

init()

Init all other properties and jobs

method

gc()

GC process for the process to save memory after it's done

method

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

Log message for the process

Parameters
  • level (int | str) The log level of the record
  • msg (str) The message to log
  • *args The arguments to format the message
  • logger (LoggerAdapter, optional) The logging logger
method

run()

Run the process

class

biopipen.ns.cnvkit.CNVkitFix(*args, **kwds)Proc

Bases
biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit.py fix

Attributes
  • cache Should we detect whether the jobs are cached?
  • desc The description of the process. Will use the summary fromthe docstring by default.
  • dirsig When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
  • envs The arguments that are job-independent, useful for common optionsacross jobs.
  • envs_depth How deep to update the envs when subclassed.
  • error_strategy How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • export When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • forks How many jobs to run simultaneously?
  • input The keys for the input channel
  • input_data The input data (will be computed for dependent processes)
  • lang The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
  • name The name of the process. Will use the class name by default.
  • nexts Computed from requires to build the process relationships
  • num_retries How many times to retry to jobs once error occurs
  • order The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
  • output The output keys for the output channel(the data will be computed)
  • output_data The output data (to pass to the next processes)
  • plugin_opts Options for process-level plugins
  • requires The dependency processes
  • scheduler The scheduler to run the jobs
  • scheduler_opts The options for the scheduler
  • script The script template for the process
  • submission_batch How many jobs to be submited simultaneously
  • template Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.
Input
  • antitarget_file The antitarget file
  • reference The refence cnn file
  • sample_id Sample ID for target/antitarget files.Otherwise inferred from file names.
  • target_file The target file
Output
  • outfile The fixed coverage files (.cnr)
Envs
  • cluster (flag) Compare and use cluster-specific valuespresent in the reference profile. (requires envs.cluster=True for CNVkitReference).
  • cnvkit Path to cnvkit.py
  • no_edge (flag) Skip edge-effect correction.
  • no_gc (flag) Skip GC correction.
  • no_rmask (flag) Skip RepeatMasker correction.
Requires
  • cnvkit
    • check: {{proc.envs.cnvkit}} version
Classes
Methods
  • __init_subclass__() Do the requirements inferring since we need them to build up theprocess relationship </>
  • from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) Create a subclass of Proc using another Proc subclass or Proc itself</>
  • gc() GC process for the process to save memory after it's done</>
  • init() Init all other properties and jobs</>
  • log(level, msg, *args, logger) Log message for the process</>
  • run() Run the process</>
class

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

Bases
abc.ABCMeta

Meta class for Proc

Methods
  • __call__(cls, *args, **kwds) (Proc) Make sure Proc subclasses are singletons</>
  • __instancecheck__(cls, instance) Override for isinstance(instance, cls).</>
  • __repr__(cls) (str) Representation for the Proc subclasses</>
  • __subclasscheck__(cls, subclass) Override for issubclass(subclass, cls).</>
  • register(cls, subclass) Register a virtual subclass of an ABC.</>
staticmethod
register(cls, subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod
__instancecheck__(cls, instance)

Override for isinstance(instance, cls).

staticmethod
__subclasscheck__(cls, subclass)

Override for issubclass(subclass, cls).

staticmethod
__repr__(cls) → str

Representation for the Proc subclasses

staticmethod
__call__(cls, *args, **kwds)

Make sure Proc subclasses are singletons

Parameters
  • *args (Any) and
  • **kwds (Any) Arguments for the constructor
Returns (Proc)

The Proc instance

classmethod

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters
  • proc (Type) The Proc subclass
  • name (str, optional) The new name of the process
  • desc (str, optional) The new description of the process
  • envs (Mapping, optional) The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
  • envs_depth (int, optional) How deep to update the envs when subclassed.
  • cache (bool, optional) Whether we should check the cache for the jobs
  • export (bool, optional) When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • error_strategy (str, optional) How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • num_retries (int, optional) How many times to retry to jobs once error occurs
  • forks (int, optional) New forks for the new process
  • input_data (Any, optional) The input data for the process. Only when this processis a start process
  • order (int, optional) The order to execute the new process
  • plugin_opts (Mapping, optional) The new plugin options, unspecified items will beinherited.
  • requires (Sequence, optional) The required processes for the new process
  • scheduler (str, optional) The new shedular to run the new process
  • scheduler_opts (Mapping, optional) The new scheduler options, unspecified items willbe inherited.
  • submission_batch (int, optional) How many jobs to be submited simultaneously
Returns (Type)

The new process class

classmethod

__init_subclass__()

Do the requirements inferring since we need them to build up theprocess relationship

method

init()

Init all other properties and jobs

method

gc()

GC process for the process to save memory after it's done

method

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

Log message for the process

Parameters
  • level (int | str) The log level of the record
  • msg (str) The message to log
  • *args The arguments to format the message
  • logger (LoggerAdapter, optional) The logging logger
method

run()

Run the process

class

biopipen.ns.cnvkit.CNVkitSegment(*args, **kwds)Proc

Bases
biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit.py segment

For segmentation methods, see https://cnvkit.readthedocs.io/en/stable/pipeline.html#segmentation-methods

Attributes
  • cache Should we detect whether the jobs are cached?
  • desc The description of the process. Will use the summary fromthe docstring by default.
  • dirsig When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
  • envs The arguments that are job-independent, useful for common optionsacross jobs.
  • envs_depth How deep to update the envs when subclassed.
  • error_strategy How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • export When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • forks How many jobs to run simultaneously?
  • input The keys for the input channel
  • input_data The input data (will be computed for dependent processes)
  • lang The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
  • name The name of the process. Will use the class name by default.
  • nexts Computed from requires to build the process relationships
  • num_retries How many times to retry to jobs once error occurs
  • order The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
  • output The output keys for the output channel(the data will be computed)
  • output_data The output data (to pass to the next processes)
  • plugin_opts Options for process-level plugins
  • requires The dependency processes
  • scheduler The scheduler to run the jobs
  • scheduler_opts The options for the scheduler
  • script The script template for the process
  • submission_batch How many jobs to be submited simultaneously
  • template Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.
Input
  • cnrfile The fixed coverage files (.cnr)
  • normal_id Corresponding normal sample ID in the input VCF.This sample is used to select only germline SNVs to plot b-allele frequencies.
  • sample_id Specify the name of the sample in the VCF to use for b-allelefrequency extraction and as the default plot title.
  • vcf VCF file name containing variants for segmentationby allele frequencies (optional).
Output
  • outfile The segmentation file (.cns)
Envs
  • cnvkit Path to cnvkit.py
  • drop_low_coverage (flag) Drop very-low-coverage binsbefore segmentation to avoid false-positive deletions in poor-quality tumor samples.
  • drop_outliers (type=int) Drop outlier bins more than this manymultiples of the 95th quantile away from the average within a rolling window. Set to 0 for no outlier filtering.
  • method Method to use for segmentation.Candidates - cbs, flasso, haar, none, hmm, hmm-tumor, hmm-germline
  • min_variant_depth (type=int) Minimum read depth for a SNV to bedisplayed in the b-allele frequency plot.
  • ncores (type=int) Number of subprocesses to segment in parallel.0 or negative for all available cores
  • rscript Path to Rscript
  • smooth_cbs (flag) Perform an additional smoothing beforeCBS segmentation, which in some cases may increase the sensitivity. Used only for CBS method.
  • threshold Significance threshold (p-value or FDR, depending on method)to accept breakpoints during segmentation. For HMM methods, this is the smoothing window size.
  • zygosity_freq (type=float) Ignore VCF's genotypes (GT field) andinstead infer zygosity from allele frequencies.
Requires
  • cnvkit
    • check: {{proc.envs.cnvkit}} version
  • r-DNAcopy
    • check: {{proc.envs.rscript}} <(echo "library(DNAcopy)")
Classes
Methods
  • __init_subclass__() Do the requirements inferring since we need them to build up theprocess relationship </>
  • from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) Create a subclass of Proc using another Proc subclass or Proc itself</>
  • gc() GC process for the process to save memory after it's done</>
  • init() Init all other properties and jobs</>
  • log(level, msg, *args, logger) Log message for the process</>
  • run() Run the process</>
class

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

Bases
abc.ABCMeta

Meta class for Proc

Methods
  • __call__(cls, *args, **kwds) (Proc) Make sure Proc subclasses are singletons</>
  • __instancecheck__(cls, instance) Override for isinstance(instance, cls).</>
  • __repr__(cls) (str) Representation for the Proc subclasses</>
  • __subclasscheck__(cls, subclass) Override for issubclass(subclass, cls).</>
  • register(cls, subclass) Register a virtual subclass of an ABC.</>
staticmethod
register(cls, subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod
__instancecheck__(cls, instance)

Override for isinstance(instance, cls).

staticmethod
__subclasscheck__(cls, subclass)

Override for issubclass(subclass, cls).

staticmethod
__repr__(cls) → str

Representation for the Proc subclasses

staticmethod
__call__(cls, *args, **kwds)

Make sure Proc subclasses are singletons

Parameters
  • *args (Any) and
  • **kwds (Any) Arguments for the constructor
Returns (Proc)

The Proc instance

classmethod

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters
  • proc (Type) The Proc subclass
  • name (str, optional) The new name of the process
  • desc (str, optional) The new description of the process
  • envs (Mapping, optional) The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
  • envs_depth (int, optional) How deep to update the envs when subclassed.
  • cache (bool, optional) Whether we should check the cache for the jobs
  • export (bool, optional) When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • error_strategy (str, optional) How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • num_retries (int, optional) How many times to retry to jobs once error occurs
  • forks (int, optional) New forks for the new process
  • input_data (Any, optional) The input data for the process. Only when this processis a start process
  • order (int, optional) The order to execute the new process
  • plugin_opts (Mapping, optional) The new plugin options, unspecified items will beinherited.
  • requires (Sequence, optional) The required processes for the new process
  • scheduler (str, optional) The new shedular to run the new process
  • scheduler_opts (Mapping, optional) The new scheduler options, unspecified items willbe inherited.
  • submission_batch (int, optional) How many jobs to be submited simultaneously
Returns (Type)

The new process class

classmethod

__init_subclass__()

Do the requirements inferring since we need them to build up theprocess relationship

method

init()

Init all other properties and jobs

method

gc()

GC process for the process to save memory after it's done

method

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

Log message for the process

Parameters
  • level (int | str) The log level of the record
  • msg (str) The message to log
  • *args The arguments to format the message
  • logger (LoggerAdapter, optional) The logging logger
method

run()

Run the process

class

biopipen.ns.cnvkit.CNVkitScatter(*args, **kwds)Proc

Bases
biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit.py scatter

Attributes
  • cache Should we detect whether the jobs are cached?
  • desc The description of the process. Will use the summary fromthe docstring by default.
  • dirsig When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
  • envs The arguments that are job-independent, useful for common optionsacross jobs.
  • envs_depth How deep to update the envs when subclassed.
  • error_strategy How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • export When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • forks How many jobs to run simultaneously?
  • input The keys for the input channel
  • input_data The input data (will be computed for dependent processes)
  • lang The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
  • name The name of the process. Will use the class name by default.
  • nexts Computed from requires to build the process relationships
  • num_retries How many times to retry to jobs once error occurs
  • order The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
  • output The output keys for the output channel(the data will be computed)
  • output_data The output data (to pass to the next processes)
  • plugin_opts Options for process-level plugins
  • requires The dependency processes
  • scheduler The scheduler to run the jobs
  • scheduler_opts The options for the scheduler
  • script The script template for the process
  • submission_batch How many jobs to be submited simultaneously
  • template Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.
Input
  • cnrfile The fixed cnr file (.cnr)
  • cnsfile The segmentation file (.cns)
  • normal_id Corresponding normal sample ID in the input VCF.This sample is used to select only germline SNVs to plot b-allele frequencies.
  • sample_id Specify the name of the sample in the VCF to use for b-allelefrequency extraction and as the default plot title.
  • vcf VCF file name containing variants for segmentationby allele frequencies (optional).
Output
  • outdir Output directory with plots for multiple cases
Envs
  • antitarget_marker (flag) Plot antitargets using thissymbol when plotting in a selected chromosomal region (-g/--gene or -c/--chromosome).
  • by_bin (flag) Plot data x-coordinates by bin indicesinstead of genomic coordinates. All bins will be shown with equal width, no blank regions will be shown, and x-axis values indicate bin number (within chromosome) instead of genomic position.
  • cases (type=json) The cases for different plots with keys as case namesand values to overwrite the default args given by envs.<args>, including convert_args, by_bin, chromosome, gene, width antitarget_marker, segment_color, trend, y_max, y_min, min_variant_depth, zygosity_freq and title. By default, anall` case will be created with default arguments if no case specified
  • chromosome Chromosome or chromosomal range,e.g. 'chr1' or 'chr1:2333000-2444000', to display. If a range is given, all targeted genes in this range will be shown, unless -g/--gene is also given.
  • cnvkit Path to cnvkit.py
  • convert Path to convert to convert pdf to png file
  • convert_args (ns) The arguments for convert
    • - density (type=int): Horizontal and vertical density of the image
    • - quality (type=int): JPEG/MIFF/PNG compression level
    • - background: Background color
    • - alpha: Activate, deactivate, reset, or set the alpha channel
    • - : See convert -help and also:
        https://linux.die.net/man/1/convert
  • gene Name of gene or genes (comma-separated) to display.
  • min_variant_depth (type=int) Minimum read depth for a SNV to bedisplayed in the b-allele frequency plot.
  • segment_color Plot segment lines in this color. Value can beany string accepted by matplotlib, e.g. 'red' or '#CC0000'.
  • title Plot title. Sample ID if not provided.
  • trend (flag) Draw a smoothed local trendline on thescatter plot.
  • width (type=int) Width of margin to show around the selected gene(s)(-g/--gene) or small chromosomal region (-c/--chromosome).
  • y_max (type=int) y-axis upper limit.
  • y_min (tyoe=int) y-axis lower limit.
  • zygosity_freq (typ=float) Ignore VCF's genotypes (GT field) andinstead infer zygosity from allele frequencies.
Requires
  • cnvkit
    • check: {{proc.envs.cnvkit}} version
  • convert
    • check: {{proc.envs.convert}} -version
Classes
Methods
  • __init_subclass__() Do the requirements inferring since we need them to build up theprocess relationship </>
  • from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) Create a subclass of Proc using another Proc subclass or Proc itself</>
  • gc() GC process for the process to save memory after it's done</>
  • init() Init all other properties and jobs</>
  • log(level, msg, *args, logger) Log message for the process</>
  • run() Run the process</>
class

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

Bases
abc.ABCMeta

Meta class for Proc

Methods
  • __call__(cls, *args, **kwds) (Proc) Make sure Proc subclasses are singletons</>
  • __instancecheck__(cls, instance) Override for isinstance(instance, cls).</>
  • __repr__(cls) (str) Representation for the Proc subclasses</>
  • __subclasscheck__(cls, subclass) Override for issubclass(subclass, cls).</>
  • register(cls, subclass) Register a virtual subclass of an ABC.</>
staticmethod
register(cls, subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod
__instancecheck__(cls, instance)

Override for isinstance(instance, cls).

staticmethod
__subclasscheck__(cls, subclass)

Override for issubclass(subclass, cls).

staticmethod
__repr__(cls) → str

Representation for the Proc subclasses

staticmethod
__call__(cls, *args, **kwds)

Make sure Proc subclasses are singletons

Parameters
  • *args (Any) and
  • **kwds (Any) Arguments for the constructor
Returns (Proc)

The Proc instance

classmethod

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters
  • proc (Type) The Proc subclass
  • name (str, optional) The new name of the process
  • desc (str, optional) The new description of the process
  • envs (Mapping, optional) The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
  • envs_depth (int, optional) How deep to update the envs when subclassed.
  • cache (bool, optional) Whether we should check the cache for the jobs
  • export (bool, optional) When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • error_strategy (str, optional) How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • num_retries (int, optional) How many times to retry to jobs once error occurs
  • forks (int, optional) New forks for the new process
  • input_data (Any, optional) The input data for the process. Only when this processis a start process
  • order (int, optional) The order to execute the new process
  • plugin_opts (Mapping, optional) The new plugin options, unspecified items will beinherited.
  • requires (Sequence, optional) The required processes for the new process
  • scheduler (str, optional) The new shedular to run the new process
  • scheduler_opts (Mapping, optional) The new scheduler options, unspecified items willbe inherited.
  • submission_batch (int, optional) How many jobs to be submited simultaneously
Returns (Type)

The new process class

classmethod

__init_subclass__()

Do the requirements inferring since we need them to build up theprocess relationship

method

init()

Init all other properties and jobs

method

gc()

GC process for the process to save memory after it's done

method

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

Log message for the process

Parameters
  • level (int | str) The log level of the record
  • msg (str) The message to log
  • *args The arguments to format the message
  • logger (LoggerAdapter, optional) The logging logger
method

run()

Run the process

class

biopipen.ns.cnvkit.CNVkitDiagram(*args, **kwds)Proc

Bases
biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit.py diagram

Attributes
  • cache Should we detect whether the jobs are cached?
  • desc The description of the process. Will use the summary fromthe docstring by default.
  • dirsig When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
  • envs The arguments that are job-independent, useful for common optionsacross jobs.
  • envs_depth How deep to update the envs when subclassed.
  • error_strategy How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • export When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • forks How many jobs to run simultaneously?
  • input The keys for the input channel
  • input_data The input data (will be computed for dependent processes)
  • lang The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
  • name The name of the process. Will use the class name by default.
  • nexts Computed from requires to build the process relationships
  • num_retries How many times to retry to jobs once error occurs
  • order The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
  • output The output keys for the output channel(the data will be computed)
  • output_data The output data (to pass to the next processes)
  • plugin_opts Options for process-level plugins
  • requires The dependency processes
  • scheduler The scheduler to run the jobs
  • scheduler_opts The options for the scheduler
  • script The script template for the process
  • submission_batch How many jobs to be submited simultaneously
  • template Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.
Input
  • cnrfile The fixed cnr file (.cnr)
  • cnsfile The segmentation file (.cns)
  • sample_sex Specify the sample's chromosomal sex as male or female.(Otherwise guessed from X and Y coverage).
Output
  • outdir Output directory with the scatter plots
Envs
  • cases (type=json) The cases with keys as names and values as differentconfigs, including threshold, min_probes, male_reference, no_shift_xy and title
  • cnvkit Path to cnvkit.py
  • convert Path to convert to convert pdf to png file
  • convert_args (ns) The arguments for convert
    • - density (type=int): Horizontal and vertical density of the image
    • - quality (type=int): JPEG/MIFF/PNG compression level
    • - background: Background color
    • - alpha: Activate, deactivate, reset, or set the alpha channel
    • - : See convert -help and also:
        https://linux.die.net/man/1/convert
  • male_reference (flag) Assume inputs were normalized to amale reference (i.e. female samples will have +1 log-CNR of chrX; otherwise male samples would have -1 chrX).
  • min_probes (type=int) Minimum number of covered probes to label a gene.
  • no_shift_xy (flag) Don't adjust the X and Y chromosomesaccording to sample sex.
  • threshold (type=float) Copy number change threshold to label genes.
  • title Plot title. Sample ID if not provided.
Requires
  • cnvkit
    • check: {{proc.envs.cnvkit}} version
  • convert
    • check: {{proc.envs.convert}} -version
Classes
Methods
  • __init_subclass__() Do the requirements inferring since we need them to build up theprocess relationship </>
  • from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) Create a subclass of Proc using another Proc subclass or Proc itself</>
  • gc() GC process for the process to save memory after it's done</>
  • init() Init all other properties and jobs</>
  • log(level, msg, *args, logger) Log message for the process</>
  • run() Run the process</>
class

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

Bases
abc.ABCMeta

Meta class for Proc

Methods
  • __call__(cls, *args, **kwds) (Proc) Make sure Proc subclasses are singletons</>
  • __instancecheck__(cls, instance) Override for isinstance(instance, cls).</>
  • __repr__(cls) (str) Representation for the Proc subclasses</>
  • __subclasscheck__(cls, subclass) Override for issubclass(subclass, cls).</>
  • register(cls, subclass) Register a virtual subclass of an ABC.</>
staticmethod
register(cls, subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod
__instancecheck__(cls, instance)

Override for isinstance(instance, cls).

staticmethod
__subclasscheck__(cls, subclass)

Override for issubclass(subclass, cls).

staticmethod
__repr__(cls) → str

Representation for the Proc subclasses

staticmethod
__call__(cls, *args, **kwds)

Make sure Proc subclasses are singletons

Parameters
  • *args (Any) and
  • **kwds (Any) Arguments for the constructor
Returns (Proc)

The Proc instance

classmethod

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters
  • proc (Type) The Proc subclass
  • name (str, optional) The new name of the process
  • desc (str, optional) The new description of the process
  • envs (Mapping, optional) The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
  • envs_depth (int, optional) How deep to update the envs when subclassed.
  • cache (bool, optional) Whether we should check the cache for the jobs
  • export (bool, optional) When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • error_strategy (str, optional) How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • num_retries (int, optional) How many times to retry to jobs once error occurs
  • forks (int, optional) New forks for the new process
  • input_data (Any, optional) The input data for the process. Only when this processis a start process
  • order (int, optional) The order to execute the new process
  • plugin_opts (Mapping, optional) The new plugin options, unspecified items will beinherited.
  • requires (Sequence, optional) The required processes for the new process
  • scheduler (str, optional) The new shedular to run the new process
  • scheduler_opts (Mapping, optional) The new scheduler options, unspecified items willbe inherited.
  • submission_batch (int, optional) How many jobs to be submited simultaneously
Returns (Type)

The new process class

classmethod

__init_subclass__()

Do the requirements inferring since we need them to build up theprocess relationship

method

init()

Init all other properties and jobs

method

gc()

GC process for the process to save memory after it's done

method

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

Log message for the process

Parameters
  • level (int | str) The log level of the record
  • msg (str) The message to log
  • *args The arguments to format the message
  • logger (LoggerAdapter, optional) The logging logger
method

run()

Run the process

class

biopipen.ns.cnvkit.CNVkitHeatmap(*args, **kwds)Proc

Bases
biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit.py heatmap for multiple cases

Attributes
  • cache Should we detect whether the jobs are cached?
  • desc The description of the process. Will use the summary fromthe docstring by default.
  • dirsig When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
  • envs The arguments that are job-independent, useful for common optionsacross jobs.
  • envs_depth How deep to update the envs when subclassed.
  • error_strategy How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • export When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • forks How many jobs to run simultaneously?
  • input The keys for the input channel
  • input_data The input data (will be computed for dependent processes)
  • lang The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
  • name The name of the process. Will use the class name by default.
  • nexts Computed from requires to build the process relationships
  • num_retries How many times to retry to jobs once error occurs
  • order The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
  • output The output keys for the output channel(the data will be computed)
  • output_data The output data (to pass to the next processes)
  • plugin_opts Options for process-level plugins
  • requires The dependency processes
  • scheduler The scheduler to run the jobs
  • scheduler_opts The options for the scheduler
  • script The script template for the process
  • submission_batch How many jobs to be submited simultaneously
  • template Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.
Input
  • sample_sex Specify the chromosomal sex of all given samples as maleor female. Separated by comma. (Default: guess each sample from coverage of X and Y chromosomes).
  • segfiles Sample coverages as raw probes (.cnr) or segments (.cns).
Output
  • outdir Output directory with heatmaps of multiple cases
Envs
  • by_bin (flag) Plot data x-coordinates by bin indicesinstead of genomic coordinates. All bins will be shown with equal width, no blank regions will be shown, and x-axis values indicate bin number (within chromosome) instead of genomic position.
  • cases (type=json) The cases for different plots with keys as case namesand values to overwrite the default args given by envs.<args>, including convert_args, by_bin, chromosome, desaturate, male_reference, and, no_shift_xy. By default, an all case will be created with default arguments if no case specified
  • chromosome Chromosome (e.g. 'chr1') or chromosomal range(e.g. 'chr1:2333000-2444000') to display.
  • cnvkit Path to cnvkit.py
  • convert Path to convert to convert pdf to png file
  • convert_args (ns) The arguments for convert
    • - density (type=int): Horizontal and vertical density of the image
    • - quality (type=int): JPEG/MIFF/PNG compression level
    • - background: Background color
    • - alpha: Activate, deactivate, reset, or set the alpha channel
    • - : See convert -help and also:
        https://linux.die.net/man/1/convert
  • desaturate (flag) Tweak color saturation to focus onsignificant changes.
  • male_reference (flag) Assume inputs were normalized toa male reference. (i.e. female samples will have +1 log-CNR of chrX; otherwise male samples would have -1 chrX).
  • no_shift_xy (flag) Don't adjust the X and Y chromosomesaccording to sample sex.
  • order A file with sample names in the desired order.
Requires
  • cnvkit
    • check: {{proc.envs.cnvkit}} version
  • convert
    • check: {{proc.envs.convert}} -version
Classes
Methods
  • __init_subclass__() Do the requirements inferring since we need them to build up theprocess relationship </>
  • from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) Create a subclass of Proc using another Proc subclass or Proc itself</>
  • gc() GC process for the process to save memory after it's done</>
  • init() Init all other properties and jobs</>
  • log(level, msg, *args, logger) Log message for the process</>
  • run() Run the process</>
class

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

Bases
abc.ABCMeta

Meta class for Proc

Methods
  • __call__(cls, *args, **kwds) (Proc) Make sure Proc subclasses are singletons</>
  • __instancecheck__(cls, instance) Override for isinstance(instance, cls).</>
  • __repr__(cls) (str) Representation for the Proc subclasses</>
  • __subclasscheck__(cls, subclass) Override for issubclass(subclass, cls).</>
  • register(cls, subclass) Register a virtual subclass of an ABC.</>
staticmethod
register(cls, subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod
__instancecheck__(cls, instance)

Override for isinstance(instance, cls).

staticmethod
__subclasscheck__(cls, subclass)

Override for issubclass(subclass, cls).

staticmethod
__repr__(cls) → str

Representation for the Proc subclasses

staticmethod
__call__(cls, *args, **kwds)

Make sure Proc subclasses are singletons

Parameters
  • *args (Any) and
  • **kwds (Any) Arguments for the constructor
Returns (Proc)

The Proc instance

classmethod

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters
  • proc (Type) The Proc subclass
  • name (str, optional) The new name of the process
  • desc (str, optional) The new description of the process
  • envs (Mapping, optional) The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
  • envs_depth (int, optional) How deep to update the envs when subclassed.
  • cache (bool, optional) Whether we should check the cache for the jobs
  • export (bool, optional) When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • error_strategy (str, optional) How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • num_retries (int, optional) How many times to retry to jobs once error occurs
  • forks (int, optional) New forks for the new process
  • input_data (Any, optional) The input data for the process. Only when this processis a start process
  • order (int, optional) The order to execute the new process
  • plugin_opts (Mapping, optional) The new plugin options, unspecified items will beinherited.
  • requires (Sequence, optional) The required processes for the new process
  • scheduler (str, optional) The new shedular to run the new process
  • scheduler_opts (Mapping, optional) The new scheduler options, unspecified items willbe inherited.
  • submission_batch (int, optional) How many jobs to be submited simultaneously
Returns (Type)

The new process class

classmethod

__init_subclass__()

Do the requirements inferring since we need them to build up theprocess relationship

method

init()

Init all other properties and jobs

method

gc()

GC process for the process to save memory after it's done

method

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

Log message for the process

Parameters
  • level (int | str) The log level of the record
  • msg (str) The message to log
  • *args The arguments to format the message
  • logger (LoggerAdapter, optional) The logging logger
method

run()

Run the process

class

biopipen.ns.cnvkit.CNVkitCall(*args, **kwds)Proc

Bases
biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit.py call

Attributes
  • cache Should we detect whether the jobs are cached?
  • desc The description of the process. Will use the summary fromthe docstring by default.
  • dirsig When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
  • envs The arguments that are job-independent, useful for common optionsacross jobs.
  • envs_depth How deep to update the envs when subclassed.
  • error_strategy How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • export When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • forks How many jobs to run simultaneously?
  • input The keys for the input channel
  • input_data The input data (will be computed for dependent processes)
  • lang The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
  • name The name of the process. Will use the class name by default.
  • nexts Computed from requires to build the process relationships
  • num_retries How many times to retry to jobs once error occurs
  • order The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
  • output The output keys for the output channel(the data will be computed)
  • output_data The output data (to pass to the next processes)
  • plugin_opts Options for process-level plugins
  • requires The dependency processes
  • scheduler The scheduler to run the jobs
  • scheduler_opts The options for the scheduler
  • script The script template for the process
  • submission_batch How many jobs to be submited simultaneously
  • template Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.
Input
  • cnrfile The fixed cnr file (.cnr), used to generate VCF file
  • cnsfile The segmentation file (.cns)
  • normal_id Corresponding normal sample ID in the input VCF.This sample is used to select only germline SNVs to plot b-allele frequencies.
  • purity Estimated tumor cell fraction, a.k.a. purity or cellularity.
  • sample_id Specify the name of the sample in the VCF to use for b-allelefrequency extraction and as the default plot title.
  • sample_sex Specify the sample's chromosomal sex as male or female.(Otherwise guessed from X and Y coverage).
  • vcf VCF file name containing variants for segmentationby allele frequencies (optional).
Output
  • outdir The output directory including the call file (.call.cns)bed file, and the vcf file
Envs
  • center Re-center the log2 ratio values using this estimator ofthe center or average value.
  • center_at (type=float) Subtract a constant number from all log2 ratios.For "manual" re-centering, in case the --center option gives unsatisfactory results.)
  • cnvkit Path to cnvkit.py
  • drop_low_coverage (flag) Drop very-low-coverage binsbefore segmentation to avoid false-positive deletions in poor-quality tumor samples.
  • filter Merge segments flagged by the specifiedfilter(s) with the adjacent segment(s).
  • male_reference (flag) Assume inputs were normalized to amale reference. (i.e. female samples will have +1 log-CNR of chrX; otherwise male samples would have -1 chrX).
  • method (choice) Calling method (threshold, clonal or none).
    • - threshold: Using hard thresholds for calling each integer copy
        number.
        Use thresholds to set a list of threshold log2 values for
        each copy number state
    • - clonal: Rescaling and rounding.
        For a given known tumor cell fraction and normal ploidy,
        then simple rounding to the nearest integer copy number
    • - none: Do not add a “cn” column or allele copy numbers.
        But still performs rescaling, re-centering, and extracting
        b-allele frequencies from a VCF (if requested).
  • min_variant_depth (type=int) Minimum read depth for a SNV to bedisplayed in the b-allele frequency plot.
  • ploidy (type=float) Ploidy of the sample cells.
  • thresholds Hard thresholds for calling each integer copy number,separated by commas.
  • zygosity_freq (type=float) Ignore VCF's genotypes (GT field) andinstead infer zygosity from allele frequencies.
Requires
  • cnvkit
    • check: {{proc.envs.cnvkit}} version
Classes
Methods
  • __init_subclass__() Do the requirements inferring since we need them to build up theprocess relationship </>
  • from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) Create a subclass of Proc using another Proc subclass or Proc itself</>
  • gc() GC process for the process to save memory after it's done</>
  • init() Init all other properties and jobs</>
  • log(level, msg, *args, logger) Log message for the process</>
  • run() Run the process</>
class

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

Bases
abc.ABCMeta

Meta class for Proc

Methods
  • __call__(cls, *args, **kwds) (Proc) Make sure Proc subclasses are singletons</>
  • __instancecheck__(cls, instance) Override for isinstance(instance, cls).</>
  • __repr__(cls) (str) Representation for the Proc subclasses</>
  • __subclasscheck__(cls, subclass) Override for issubclass(subclass, cls).</>
  • register(cls, subclass) Register a virtual subclass of an ABC.</>
staticmethod
register(cls, subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod
__instancecheck__(cls, instance)

Override for isinstance(instance, cls).

staticmethod
__subclasscheck__(cls, subclass)

Override for issubclass(subclass, cls).

staticmethod
__repr__(cls) → str

Representation for the Proc subclasses

staticmethod
__call__(cls, *args, **kwds)

Make sure Proc subclasses are singletons

Parameters
  • *args (Any) and
  • **kwds (Any) Arguments for the constructor
Returns (Proc)

The Proc instance

classmethod

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters
  • proc (Type) The Proc subclass
  • name (str, optional) The new name of the process
  • desc (str, optional) The new description of the process
  • envs (Mapping, optional) The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
  • envs_depth (int, optional) How deep to update the envs when subclassed.
  • cache (bool, optional) Whether we should check the cache for the jobs
  • export (bool, optional) When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • error_strategy (str, optional) How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • num_retries (int, optional) How many times to retry to jobs once error occurs
  • forks (int, optional) New forks for the new process
  • input_data (Any, optional) The input data for the process. Only when this processis a start process
  • order (int, optional) The order to execute the new process
  • plugin_opts (Mapping, optional) The new plugin options, unspecified items will beinherited.
  • requires (Sequence, optional) The required processes for the new process
  • scheduler (str, optional) The new shedular to run the new process
  • scheduler_opts (Mapping, optional) The new scheduler options, unspecified items willbe inherited.
  • submission_batch (int, optional) How many jobs to be submited simultaneously
Returns (Type)

The new process class

classmethod

__init_subclass__()

Do the requirements inferring since we need them to build up theprocess relationship

method

init()

Init all other properties and jobs

method

gc()

GC process for the process to save memory after it's done

method

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

Log message for the process

Parameters
  • level (int | str) The log level of the record
  • msg (str) The message to log
  • *args The arguments to format the message
  • logger (LoggerAdapter, optional) The logging logger
method

run()

Run the process

class

biopipen.ns.cnvkit.CNVkitBatch(*args, **kwds)Proc

Bases
biopipen.core.proc.Proc pipen.proc.Proc

Run cnvkit batch

If you need in-depth control of the parameters, for example, multiple scatter plots in different regions, or you need to specify sample-sex for different samples, take a look at biopipen.ns.cnvkit_pipeline

Attributes
  • cache Should we detect whether the jobs are cached?
  • desc The description of the process. Will use the summary fromthe docstring by default.
  • dirsig When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
  • envs The arguments that are job-independent, useful for common optionsacross jobs.
  • envs_depth How deep to update the envs when subclassed.
  • error_strategy How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • export When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • forks How many jobs to run simultaneously?
  • input The keys for the input channel
  • input_data The input data (will be computed for dependent processes)
  • lang The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
  • name The name of the process. Will use the class name by default.
  • nexts Computed from requires to build the process relationships
  • num_retries How many times to retry to jobs once error occurs
  • order The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
  • output The output keys for the output channel(the data will be computed)
  • output_data The output data (to pass to the next processes)
  • plugin_opts Options for process-level plugins
  • requires The dependency processes
  • scheduler The scheduler to run the jobs
  • scheduler_opts The options for the scheduler
  • script The script template for the process
  • submission_batch How many jobs to be submited simultaneously
  • template Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.
Input
  • metafile The meta data file containing the sample informationTwo columns BamFile and envs.type_col are required. The tumor samples should be labeled as envs.type_tumor and the normal samples should be labeled as envs.type_normal in the envs.type_col column. If normal samples are not found, a flat reference will be used. The could be other columns in the meta file, but they could be used in biopipen.ns.cnvkit_pipeline.
Output
  • outdir The output directory
Envs
  • access Regions of accessible sequence on chromosomes (.bed),as output by the 'access' command.
  • access_excludes Exclude these regions from the accessible genomeUsed when envs.access is not specified.
  • access_min_gap_size Minimum gap size between accessiblesequence regions if envs.access is not specified.
  • annotate Use gene models from this file to assign names to thetarget regions. Format: UCSC refFlat.txt or ensFlat.txt file (preferred), or BED, interval list, GFF, or similar.
  • antitarget_avg_size Average size of antitarget bins(results are approximate).
  • antitarget_min_size Minimum size of antitarget bins(smaller regions are dropped).
  • antitargets Anti-target intervals (.bed or .list) (optional for wgs)
  • cluster Calculate and use cluster-specific summary stats in thereference pool to normalize samples.
  • cnvkit Path to cnvkit.py
  • count_reads Get read depths by counting read midpoints within each bin.(An alternative algorithm).
  • diagram Create an ideogram of copy ratios on chromosomes as a PDF.
  • drop_low_coverage Drop very-low-coverage bins before segmentation toavoid false-positive deletions in poor-quality tumor samples.
  • male_reference Use or assume a male reference (i.e. female sampleswill have +1 log-CNR of chrX; otherwise male samples would have -1 chrX).
  • method Sequencing assay type: hybridization capture ('hybrid'),targeted amplicon sequencing ('amplicon'), or whole genome sequencing ('wgs'). Determines whether and how to use antitarget bins.
  • ncores Number of subprocesses used to running each of the BAM filesin parallel
  • ref Path to a FASTA file containing the reference genome.
  • reference Copy number reference file (.cnn) to reuse
  • rscript Path to the Rscript excecutable to use for running R code.Use this option to specify a non-default R installation.
  • scatter Create a whole-genome copy ratio profile as a PDF scatter plot.
  • segment_method cbs,flasso,haar,none,hmm,hmm-tumor,hmm-germlineMethod used in the 'segment' step.
  • short_names Reduce multi-accession bait labels to be shortand consistent.
  • target_avg_size Average size of split target bins(results are approximate).
  • targets Target intervals (.bed or .list) (optional for wgs)
  • type_col type_col: The column name in the metafile thatindicates the sample type.
  • type_normal The type of normal samples in envs.type_col column ofin.metafile
  • type_tumor The type of tumor samples in envs.type_col column ofin.metafile
Requires
  • cnvkit
    • check: {{proc.envs.cnvkit}} version
  • r-DNAcopy
    • check: {{proc.envs.rscript}} <(echo "library(DNAcopy)")
Classes
Methods
  • __init_subclass__() Do the requirements inferring since we need them to build up theprocess relationship </>
  • from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) Create a subclass of Proc using another Proc subclass or Proc itself</>
  • gc() GC process for the process to save memory after it's done</>
  • init() Init all other properties and jobs</>
  • log(level, msg, *args, logger) Log message for the process</>
  • run() Run the process</>
class

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

Bases
abc.ABCMeta

Meta class for Proc

Methods
  • __call__(cls, *args, **kwds) (Proc) Make sure Proc subclasses are singletons</>
  • __instancecheck__(cls, instance) Override for isinstance(instance, cls).</>
  • __repr__(cls) (str) Representation for the Proc subclasses</>
  • __subclasscheck__(cls, subclass) Override for issubclass(subclass, cls).</>
  • register(cls, subclass) Register a virtual subclass of an ABC.</>
staticmethod
register(cls, subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod
__instancecheck__(cls, instance)

Override for isinstance(instance, cls).

staticmethod
__subclasscheck__(cls, subclass)

Override for issubclass(subclass, cls).

staticmethod
__repr__(cls) → str

Representation for the Proc subclasses

staticmethod
__call__(cls, *args, **kwds)

Make sure Proc subclasses are singletons

Parameters
  • *args (Any) and
  • **kwds (Any) Arguments for the constructor
Returns (Proc)

The Proc instance

classmethod

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters
  • proc (Type) The Proc subclass
  • name (str, optional) The new name of the process
  • desc (str, optional) The new description of the process
  • envs (Mapping, optional) The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
  • envs_depth (int, optional) How deep to update the envs when subclassed.
  • cache (bool, optional) Whether we should check the cache for the jobs
  • export (bool, optional) When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • error_strategy (str, optional) How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • num_retries (int, optional) How many times to retry to jobs once error occurs
  • forks (int, optional) New forks for the new process
  • input_data (Any, optional) The input data for the process. Only when this processis a start process
  • order (int, optional) The order to execute the new process
  • plugin_opts (Mapping, optional) The new plugin options, unspecified items will beinherited.
  • requires (Sequence, optional) The required processes for the new process
  • scheduler (str, optional) The new shedular to run the new process
  • scheduler_opts (Mapping, optional) The new scheduler options, unspecified items willbe inherited.
  • submission_batch (int, optional) How many jobs to be submited simultaneously
Returns (Type)

The new process class

classmethod

__init_subclass__()

Do the requirements inferring since we need them to build up theprocess relationship

method

init()

Init all other properties and jobs

method

gc()

GC process for the process to save memory after it's done

method

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

Log message for the process

Parameters
  • level (int | str) The log level of the record
  • msg (str) The message to log
  • *args The arguments to format the message
  • logger (LoggerAdapter, optional) The logging logger
method

run()

Run the process

class

biopipen.ns.cnvkit.CNVkitGuessBaits(*args, **kwds)Proc

Bases
biopipen.core.proc.Proc pipen.proc.Proc

Guess the bait intervals from the bam files

It runs scripts/guess_baits.py from the cnvkit repo.

Attributes
  • cache Should we detect whether the jobs are cached?
  • desc The description of the process. Will use the summary fromthe docstring by default.
  • dirsig When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.
  • envs The arguments that are job-independent, useful for common optionsacross jobs.
  • envs_depth How deep to update the envs when subclassed.
  • error_strategy How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • export When True, the results will be exported to <pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • forks How many jobs to run simultaneously?
  • input The keys for the input channel
  • input_data The input data (will be computed for dependent processes)
  • lang The language for the script to run. Should be the path to theinterpreter if lang is not in $PATH.
  • name The name of the process. Will use the class name by default.
  • nexts Computed from requires to build the process relationships
  • num_retries How many times to retry to jobs once error occurs
  • order The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined by Pipen.set_starts()
  • output The output keys for the output channel(the data will be computed)
  • output_data The output data (to pass to the next processes)
  • plugin_opts Options for process-level plugins
  • requires The dependency processes
  • scheduler The scheduler to run the jobs
  • scheduler_opts The options for the scheduler
  • script The script template for the process
  • submission_batch How many jobs to be submited simultaneously
  • template Define the template engine to use.This could be either a template engine or a dict with key engine indicating the template engine and the rest the arguments passed to the constructor of the pipen.template.Template object. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass of pipen.template.Template. You can subclass pipen.template.Template to use your own template engine.
Input
  • atfile The potential target file or access filee.g. all known exons in the reference genome or from cnvkit.py access
  • bamfiles The bam files
Output
  • targetfile The target file
Envs
  • cnvkit Path to cnvkit.py
  • guided (flag) in.atfile is a potential target file whenTrue, otherwise it is an access file.
  • min_depth (type=int) Minimum sequencing read depth to accept ascaptured. For guided only.
  • min_gap (type=int) Merge regions separated by gaps smaller than this.
  • min_length (type=int) Minimum region length to accept as captured.min_gap and min_length are for unguided only.
  • ncores (type=int) Number of subprocesses to segment in parallel0 to use the maximum number of available CPUs.
  • ref Path to a FASTA file containing the reference genome.
  • samtools Path to samtools executable
Classes
Methods
  • __init_subclass__() Do the requirements inferring since we need them to build up theprocess relationship </>
  • from_proc(proc, name, desc, envs, envs_depth, cache, export, error_strategy, num_retries, forks, input_data, order, plugin_opts, requires, scheduler, scheduler_opts, submission_batch) (Type) Create a subclass of Proc using another Proc subclass or Proc itself</>
  • gc() GC process for the process to save memory after it's done</>
  • init() Init all other properties and jobs</>
  • log(level, msg, *args, logger) Log message for the process</>
  • run() Run the process</>
class

pipen.proc.ProcMeta(name, bases, namespace, **kwargs)

Bases
abc.ABCMeta

Meta class for Proc

Methods
  • __call__(cls, *args, **kwds) (Proc) Make sure Proc subclasses are singletons</>
  • __instancecheck__(cls, instance) Override for isinstance(instance, cls).</>
  • __repr__(cls) (str) Representation for the Proc subclasses</>
  • __subclasscheck__(cls, subclass) Override for issubclass(subclass, cls).</>
  • register(cls, subclass) Register a virtual subclass of an ABC.</>
staticmethod
register(cls, subclass)

Register a virtual subclass of an ABC.

Returns the subclass, to allow usage as a class decorator.

staticmethod
__instancecheck__(cls, instance)

Override for isinstance(instance, cls).

staticmethod
__subclasscheck__(cls, subclass)

Override for issubclass(subclass, cls).

staticmethod
__repr__(cls) → str

Representation for the Proc subclasses

staticmethod
__call__(cls, *args, **kwds)

Make sure Proc subclasses are singletons

Parameters
  • *args (Any) and
  • **kwds (Any) Arguments for the constructor
Returns (Proc)

The Proc instance

classmethod

from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)

Create a subclass of Proc using another Proc subclass or Proc itself

Parameters
  • proc (Type) The Proc subclass
  • name (str, optional) The new name of the process
  • desc (str, optional) The new description of the process
  • envs (Mapping, optional) The arguments of the process, will overwrite parent oneThe items that are specified will be inherited
  • envs_depth (int, optional) How deep to update the envs when subclassed.
  • cache (bool, optional) Whether we should check the cache for the jobs
  • export (bool, optional) When True, the results will be exported to<pipeline.outdir> Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processes
  • error_strategy (str, optional) How to deal with the errors
    • - retry, ignore, halt
    • - halt to halt the whole pipeline, no submitting new jobs
    • - terminate to just terminate the job itself
  • num_retries (int, optional) How many times to retry to jobs once error occurs
  • forks (int, optional) New forks for the new process
  • input_data (Any, optional) The input data for the process. Only when this processis a start process
  • order (int, optional) The order to execute the new process
  • plugin_opts (Mapping, optional) The new plugin options, unspecified items will beinherited.
  • requires (Sequence, optional) The required processes for the new process
  • scheduler (str, optional) The new shedular to run the new process
  • scheduler_opts (Mapping, optional) The new scheduler options, unspecified items willbe inherited.
  • submission_batch (int, optional) How many jobs to be submited simultaneously
Returns (Type)

The new process class

classmethod

__init_subclass__()

Do the requirements inferring since we need them to build up theprocess relationship

method

init()

Init all other properties and jobs

method

gc()

GC process for the process to save memory after it's done

method

log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)

Log message for the process

Parameters
  • level (int | str) The log level of the record
  • msg (str) The message to log
  • *args The arguments to format the message
  • logger (LoggerAdapter, optional) The logging logger
method

run()

Run the process