biopipen.ns.pipseeker
Pipseeker processes
PipseekerFull(Proc) — Run pipseeker full command</>PipseekerSummary(Proc) — Summarize the output of pipseeker full command</>PipseekerPipeline— The pipseeker pipeline</>
biopipen.ns.pipseeker.PipseekerFull(*args, **kwds) → Proc
Run pipseeker full command
Tested with pipseeker v3.3.0
cache— Should we detect whether the jobs are cached?desc— The description of the process. Will use the summary fromthe docstring by default.dirsig— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth— How deep to update the envs when subclassed.error_strategy— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export— When True, the results will be exported to<pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks— How many jobs to run simultaneously?input— The keys for the input channelinput_data— The input data (will be computed for dependent processes)lang— The language for the script to run. Should be the path to theinterpreter iflangis not in$PATH.name— The name of the process. Will use the class name by default.nexts— Computed fromrequiresto build the process relationshipsnum_retries— How many times to retry to jobs once error occursorder— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()output— The output keys for the output channel(the data will be computed)output_data— The output data (to pass to the next processes)output_flatten— Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g.<outdir>/0,<outdir>/1, etc.). Ifoutput_flattenis True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
plugin_opts— Options for process-level pluginsrequires— The dependency processesscheduler— The scheduler to run the jobsscheduler_opts— The options for the schedulerscript— The script template for the processsubmission_batch— How many jobs to be submited simultaneously.The program entrance for some schedulers may take too much resources when submitting a job or checking the job status. So we may use a smaller number here to limit the simultaneous submissions.template— Define the template engine to use.This could be either a template engine or a dict with keyengineindicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Templateobject. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template. You can subclasspipen.template.Templateto use your own template engine.
fastqs— The input fastq file
outdir— The output directory
— Other arguments passed to pipseeker full command.See https://www.fluentbio.com/wp-content/uploads/2024/06/PIPseeker-v3.3-User-Guide.pdf for more details.chemistry— Version of the PIPseq assay (v3, v4, or V).ncores— Number of cores to useWill be passed to pipseeker with--threads.pipseeker— Path to pipseeker executableref— Path of folder containing STAR-compatible transcriptome referenceWill be passed to pipseeker with--star-index-path.remove_bam(flag) — Whether to remove the BAM file generated by pipseeker.skip_version_check(flag) — Whether to skip newer version check of pipseeker.tmpdir— Path to temporary directory, used to save the soft-lined fastq filesto pass to cellranger.verbosity— The verbosity level of pipseeker.
__init_subclass__()— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc(proc,name,desc,envs,envs_depth,cache,export,output_flatten,error_strategy,num_retries,forks,input_data,order,plugin_opts,requires,scheduler,scheduler_opts,submission_batch)(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc()— GC process for the process to save memory after it's done</>log(level,msg,*args,logger)— Log message for the process</>run()— Init all other properties and jobs</>
pipen.proc.ProcMeta(name, bases, namespace, **kwargs)
Meta class for Proc
__call__(cls,*args,**kwds)(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__(cls,instance)— Override for isinstance(instance, cls).</>__repr__(cls)(str) — Representation for the Proc subclasses</>__subclasscheck__(cls,subclass)— Override for issubclass(subclass, cls).</>register(cls,subclass)— Register a virtual subclass of an ABC.</>
register(cls, subclass)Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__(cls, instance)Override for isinstance(instance, cls).
__subclasscheck__(cls, subclass)Override for issubclass(subclass, cls).
__repr__(cls) → strRepresentation for the Proc subclasses
__call__(cls, *args, **kwds)Make sure Proc subclasses are singletons
*args(Any) — and**kwds(Any) — Arguments for the constructor
The Proc instance
from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, output_flatten=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)
Create a subclass of Proc using another Proc subclass or Proc itself
proc(Type) — The Proc subclassname(str, optional) — The new name of the processdesc(str, optional) — The new description of the processenvs(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth(int, optional) — How deep to update the envs when subclassed.cache(bool, optional) — Whether we should check the cache for the jobsexport(bool, optional) — When True, the results will be exported to<pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesoutput_flatten(bool | none, optional) — Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g.<outdir>/0,<outdir>/1, etc.). Ifoutput_flattenis True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
error_strategy(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries(int, optional) — How many times to retry to jobs once error occursforks(int, optional) — New forks for the new processinput_data(Any, optional) — The input data for the process. Only when this processis a start processorder(int, optional) — The order to execute the new processplugin_opts(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires(Sequence, optional) — The required processes for the new processscheduler(str, optional) — The new shedular to run the new processscheduler_opts(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch(int, optional) — How many jobs to be submited simultaneously.
The new process class
__init_subclass__()
Do the requirements inferring since we need them to build up theprocess relationship
run()
Init all other properties and jobs
gc()
GC process for the process to save memory after it's done
log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)
Log message for the process
level(int | str) — The log level of the recordmsg(str) — The message to log*args— The arguments to format the messagelogger(LoggerAdapter, optional) — The logging logger
biopipen.ns.pipseeker.PipseekerSummary(*args, **kwds) → Proc
Summarize the output of pipseeker full command
cache— Should we detect whether the jobs are cached?desc— The description of the process. Will use the summary fromthe docstring by default.dirsig— When checking the signature for caching, whether should we walkthrough the content of the directory? This is sometimes time-consuming if the directory is big.envs— The arguments that are job-independent, useful for common optionsacross jobs.envs_depth— How deep to update the envs when subclassed.error_strategy— How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
export— When True, the results will be exported to<pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesforks— How many jobs to run simultaneously?input— The keys for the input channelinput_data— The input data (will be computed for dependent processes)lang— The language for the script to run. Should be the path to theinterpreter iflangis not in$PATH.name— The name of the process. Will use the class name by default.nexts— Computed fromrequiresto build the process relationshipsnum_retries— How many times to retry to jobs once error occursorder— The execution order for this process. The bigger the numberis, the later the process will be executed. Default: 0. Note that the dependent processes will always be executed first. This doesn't work for start processes either, whose orders are determined byPipen.set_starts()output— The output keys for the output channel(the data will be computed)output_data— The output data (to pass to the next processes)output_flatten— Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g.<outdir>/0,<outdir>/1, etc.). Ifoutput_flattenis True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
plugin_opts— Options for process-level pluginsrequires— The dependency processesscheduler— The scheduler to run the jobsscheduler_opts— The options for the schedulerscript— The script template for the processsubmission_batch— How many jobs to be submited simultaneously.The program entrance for some schedulers may take too much resources when submitting a job or checking the job status. So we may use a smaller number here to limit the simultaneous submissions.template— Define the template engine to use.This could be either a template engine or a dict with keyengineindicating the template engine and the rest the arguments passed to the constructor of thepipen.template.Templateobject. The template engine could be either the name of the engine, currently jinja2 and liquidpy are supported, or a subclass ofpipen.template.Template. You can subclasspipen.template.Templateto use your own template engine.
indirs— The input directories containing the output of pipseeker full command.
outdir— The summarized output directory
group(type=auto) — The group of the samples for boxplots.IfNone, don't do boxplots. It can be a dict of group names and sample names, e.g.{"group1": ["sample1", "sample2"], "group2": ["sample3"]}or a file containing the group information, with the first column being the sample names and the second column being the group names. The file should be tab-delimited with no header.sensitivity(type=list) — A list of level3 of sensitivity to use forsummarization.pipseekerusually outputs 5 levels of sensitivity. Choose one of them to summarize the results.
__init_subclass__()— Do the requirements inferring since we need them to build up theprocess relationship </>from_proc(proc,name,desc,envs,envs_depth,cache,export,output_flatten,error_strategy,num_retries,forks,input_data,order,plugin_opts,requires,scheduler,scheduler_opts,submission_batch)(Type) — Create a subclass of Proc using another Proc subclass or Proc itself</>gc()— GC process for the process to save memory after it's done</>log(level,msg,*args,logger)— Log message for the process</>run()— Init all other properties and jobs</>
pipen.proc.ProcMeta(name, bases, namespace, **kwargs)
Meta class for Proc
__call__(cls,*args,**kwds)(Proc) — Make sure Proc subclasses are singletons</>__instancecheck__(cls,instance)— Override for isinstance(instance, cls).</>__repr__(cls)(str) — Representation for the Proc subclasses</>__subclasscheck__(cls,subclass)— Override for issubclass(subclass, cls).</>register(cls,subclass)— Register a virtual subclass of an ABC.</>
register(cls, subclass)Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__(cls, instance)Override for isinstance(instance, cls).
__subclasscheck__(cls, subclass)Override for issubclass(subclass, cls).
__repr__(cls) → strRepresentation for the Proc subclasses
__call__(cls, *args, **kwds)Make sure Proc subclasses are singletons
*args(Any) — and**kwds(Any) — Arguments for the constructor
The Proc instance
from_proc(proc, name=None, desc=None, envs=None, envs_depth=None, cache=None, export=None, output_flatten=None, error_strategy=None, num_retries=None, forks=None, input_data=None, order=None, plugin_opts=None, requires=None, scheduler=None, scheduler_opts=None, submission_batch=None)
Create a subclass of Proc using another Proc subclass or Proc itself
proc(Type) — The Proc subclassname(str, optional) — The new name of the processdesc(str, optional) — The new description of the processenvs(Mapping, optional) — The arguments of the process, will overwrite parent oneThe items that are specified will be inheritedenvs_depth(int, optional) — How deep to update the envs when subclassed.cache(bool, optional) — Whether we should check the cache for the jobsexport(bool, optional) — When True, the results will be exported to<pipeline.outdir>Defaults to None, meaning only end processes will export. You can set it to True/False to enable or disable exporting for processesoutput_flatten(bool | none, optional) — Whether to flatten the output when saving to the outputdirectory. Normally, the output will be saved in a subdirectory named after the job index (e.g.<outdir>/0,<outdir>/1, etc.). Ifoutput_flattenis True, the output will be saved directly in the output directory without the subdirectories. This is useful when you want the job outputs to be directly revealed in the output directory. Note that this only works for processes with export=True or end processes and make sure the name of the output files won't conflict for jobs with each other when flattening. It takes 3 possible values- - None (default): flatten the output for single-job processes only
- - True: flatten the output for all processes
- - False: never flatten the output
error_strategy(str, optional) — How to deal with the errors- - retry, ignore, halt
- - halt to halt the whole pipeline, no submitting new jobs
- - terminate to just terminate the job itself
num_retries(int, optional) — How many times to retry to jobs once error occursforks(int, optional) — New forks for the new processinput_data(Any, optional) — The input data for the process. Only when this processis a start processorder(int, optional) — The order to execute the new processplugin_opts(Mapping, optional) — The new plugin options, unspecified items will beinherited.requires(Sequence, optional) — The required processes for the new processscheduler(str, optional) — The new shedular to run the new processscheduler_opts(Mapping, optional) — The new scheduler options, unspecified items willbe inherited.submission_batch(int, optional) — How many jobs to be submited simultaneously.
The new process class
__init_subclass__()
Do the requirements inferring since we need them to build up theprocess relationship
run()
Init all other properties and jobs
gc()
GC process for the process to save memory after it's done
log(level, msg, *args, logger=<LoggerAdapter pipen.core (WARNING)>)
Log message for the process
level(int | str) — The log level of the recordmsg(str) — The message to log*args— The arguments to format the messagelogger(LoggerAdapter, optional) — The logging logger
biopipen.ns.pipseeker.PipseekerPipeline(*args, **kwds)
The pipseeker pipeline
Run pipseeker full for multiple samples and summarize the metrics.
parser— Pass arguments to initialize the parser
The parser is a singleton and by default initalized atplugin.on_init()hook, which happens usually after the initialization of a process group. </>
ProcGropuMeta— Meta class for ProcGroup</>
__init_subclass__()— This method is called when a class is subclassed.</>add_proc(self_or_method,proc)(Union) — Add a process to the proc group</>as_pipen(name,desc,outdir,**kwargs)(Pipen) — Convert the pipeline to a Pipen instance</>post_init()— Check if the input is a list of fastq files</>
pipen.procgroup.ProcGropuMeta(name, bases, namespace, **kwargs)
Meta class for ProcGroup
__call__(cls,*args,**kwds)— Make sure Proc subclasses are singletons</>__instancecheck__(cls,instance)— Override for isinstance(instance, cls).</>__subclasscheck__(cls,subclass)— Override for issubclass(subclass, cls).</>register(cls,subclass)— Register a virtual subclass of an ABC.</>
register(cls, subclass)Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
__instancecheck__(cls, instance)Override for isinstance(instance, cls).
__subclasscheck__(cls, subclass)Override for issubclass(subclass, cls).
__call__(cls, *args, **kwds)Make sure Proc subclasses are singletons
*args— and**kwds— Arguments for the constructor
The Proc instance
__init_subclass__()
This method is called when a class is subclassed.
The default implementation does nothing. It may be overridden to extend subclasses.
add_proc(self_or_method, proc=None)
Add a process to the proc group
It works either as a decorator to the process directly or as a decorator to a method that returns the process.
self_or_method(Union) — The proc group instance or a method thatreturns the processproc(Optional, optional) — The process class ifself_or_methodis the proc group
The process class if self_or_method is the proc group, ora cached property that returns the process class
as_pipen(name=None, desc=None, outdir=None, **kwargs)
Convert the pipeline to a Pipen instance
name(str | none, optional) — The name of the pipelinedesc(str | none, optional) — The description of the pipelineoutdir(str | pathlib.path | none, optional) — The output directory of the pipeline**kwargs— The keyword arguments to pass to Pipen
The Pipen instance
post_init()
Check if the input is a list of fastq files