pipen
package
pipen
A pipeline framework for python
Provide builting schedulers
Classes
LocalJob
— Job class for local scheduler</>LocalScheduler
— Local scheduler</>SgeJob
— Job class for SGE scheduler</>SgeScheduler
— SGE scheduler</>SlurmJob
— Job class for Slurm scheduler</>SlurmScheduler
— Slurm scheduler</>SshJob
— Job class for SSH scheduler</>SshScheduler
— SSH scheduler</>
Functions
get_scheduler
(
scheduler
)
(Type) — Get the scheduler by name of the scheduler class itself</>
module
pipen.exceptions
Provide exception classes
Classes
PipenException
— Base exception class for pipen</>PipenSetDataError
— When trying to set input data to processes with input_data already setusing Pipen.set_data(). </>ProcInputTypeError
— When an unsupported input type is provided</>ProcInputKeyError
— When an unsupported input value is provided</>ProcScriptFileNotFound
— When script file specified as 'file://' cannot be found</>ProcOutputNameError
— When no name or malformatted output is provided</>ProcOutputTypeError
— When an unsupported output type is provided</>ProcOutputValueError
— When a malformatted output value is provided</>ProcDependencyError
— When there is something wrong the process dependencies</>NoSuchSchedulerError
— When specified scheduler cannot be found</>WrongSchedulerTypeError
— When specified scheduler is not a subclass of Scheduler</>NoSuchTemplateEngineError
— When specified template engine cannot be found</>WrongTemplateEnginTypeError
— When specified tempalte engine is not a subclass of Scheduler</>TemplateRenderingError
— Failed to render a template</>ConfigurationError
— When something wrong set as configuration</>PipenOrProcNameError
— "When more than one processes are sharing the same workdir</>
Provide some function for creating and modifying channels (dataframes)
Functions
collapse_files
(
data
,col
)
(DataFrame) — Collapse a Channel according to the files in,other cols will use the values in row 0. </> expand_dir
(
data
,col
,pattern
,ftype
,sortby
,reverse
)
(DataFrame) — Expand a Channel according to the files in,other cols will keep the same. </>
Template adaptor for pipen
Classes
Template
(
source
,**kwargs
)
— Base class wrapper to wrap template for pipen</>TemplateLiquid
— Liquidpy template wrapper.</>TemplateJinja2
— Jinja2 template wrapper</>
Functions
get_template_engine
(
template
)
(Type) — Get the template engine by name or the template engine itself</>
Define hooks specifications and provide plugin manager
Classes
PipenMainPlugin
— The builtin core plugin, used to update the progress bar andcache the job </>XqutePipenPlugin
— The plugin for xqute working as proxy for pipen plugin hooks</>
Functions
on_complete
(
pipen
,succeeded
)
— The the pipeline is completed.</>on_init
(
pipen
)
— When the pipeline is initialized, and default configs are loaded</>on_job_cached
(
proc
,job
)
— When a job is cached.</>on_job_failed
(
proc
,job
)
— When a job is done but failed.</>on_job_init
(
proc
,job
)
— When a job is initialized</>on_job_killed
(
proc
,job
)
— When a job is killed</>on_job_killing
(
proc
,job
)
(bool) — When a job is being killed.</>on_job_polling
(
proc
,job
)
— When status of a job is being polled.</>on_job_queued
(
proc
,job
)
— When a job is queued in xqute. Note it might not be queued yet inthe scheduler system. </>on_job_started
(
proc
,job
)
— When a job starts to run in then scheduler system.</>on_job_submitted
(
proc
,job
)
— When a job is submitted in the scheduler system.</>on_job_submitting
(
proc
,job
)
(bool) — When a job is submitting.</>on_job_succeeded
(
proc
,job
)
— When a job completes successfully.</>on_proc_create
(
proc
)
— Called Proc constructor when a process is created.</>on_proc_done
(
proc
,succeeded
)
— When a process is done</>on_proc_init
(
proc
)
— Called when a process is initialized.</>on_proc_input_computed
(
proc
)
— Called after process input data is computed.</>on_proc_script_computed
(
proc
)
— Called after process script is computed.</>on_proc_shutdown
(
proc
,sig
)
— When pipeline is shutting down, by Ctrl-c for example.</>on_proc_start
(
proc
)
— When a process is starting</>on_setup
(
config
)
— Setup for plugins, primarily used for the plugins tosetup some default configurations. </>on_start
(
pipen
)
— Right before the pipeline starts running.</>
module
pipen.progressbar
Provide the PipelinePBar and ProcPBar classes
Classes
ProcPBar
— The progress bar for processes</>PipelinePBar
— Progress bar for the pipeline</>
Provide some default values/objects
Classes
ProcInputType
— Types for process inputs</>ProcOutputType
— Types for process outputs</>
Process group that contains a set of processes.
It can be easily used to create a pipeline that runs independently or integrated into a larger pipeline.
Runs directly:
>>> proc_group = ProcGroup(<options>)
>>> proc_group.as_pipen(<pipeline options>).set_data(<data>).run()
Integrated into a larger pipeline
>>> proc_group = ProcGroup(<options>)
>>> # proc could be a process within the larger pipeline
>>> proc.requires = prog_group.<proc>
To add a process to the proc group, use the add_proc
method:
>>> class MyProcGroup(ProcGroup):
>>> ...
>>>
>>> proc_group = MyProcGroup(...)
>>> @proc_group.add_proc
>>> class MyProc(Proc):
>>> ...
Or add a process at runtime:
>>> class MyProcGroup(ProcGroup):
>>> ...
>>>
>>> @ProcGroup.add_proc
>>> def my_proc(self):
>>> class MyProc(Proc):
>>> # You may use self.options here
>>> ...
>>> return MyProc
>>> proc_group = MyProcGroup(...)
Classes
ProcGropuMeta
— Meta class for ProcGroup</>ProcGroup
— A group of processes that can be run independently orintegrated into a larger pipeline. </>
Provide some utilities
Classes
RichHandler
— Subclass of rich.logging.RichHandler, showing log levels as a singlecharacter </>RichConsole
— A high level console interface.</>
Functions
brief_list
(
blist
)
(str) — Briefly show an integer list, combine the continuous numbers.</>copy_dict
(
dic
,depth
)
(Mapping) — Deep copy a dict</>desc_from_docstring
(
obj
,base
)
(str) — Get the description from docstring</>get_base
(
klass
,abc_base
,value
,value_getter
)
(Type) — Get the base class where the value was first defined</>get_logger
(
name
,level
)
(LoggerAdapter) — Get the logger by given plugin name</>get_logpanel_width
(
)
(int) — Get the width of the log content</>get_marked
(
cls
,mark_name
,default
)
(Any) — Get the marked value from a proc</>get_mtime
(
path
,dir_depth
)
(float) — Get the modification time of a path.If path is a directory, try to get the last modification time of the contents in the directory at given dir_depth </>get_shebang
(
script
)
(str) — Get the shebang of the script</>ignore_firstline_dedent
(
text
)
(str) — Like textwrap.dedent(), but ignore first empty lines</>is_loading_pipeline
(
*flags
,argv
)
(bool) — Check if we are loading the pipeline. Works only whenargv0
is "@pipen" while loading the pipeline. </>is_subclass
(
obj
,cls
)
(bool) — Tell if obj is a subclass of clsDifferences with issubclass is that we don't raise Type error if obj is not a class </>is_valid_name
(
name
)
(bool) — Check if a name is valid for a proc or pipen</>load_entrypoints
(
group
)
(Iterable) — Load objects from setuptools entrypoints by given group name</>load_pipeline
(
obj
,argv0
,argv1p
,**kwargs
)
(Pipen) — Load a pipeline from a Pipen, Proc or ProcGroup object</>log_rich_renderable
(
renderable
,color
,logfunc
,*args
,**kwargs
)
— Log a rich renderable to logger</>make_df_colnames_unique_inplace
(
thedf
)
— Make the columns of a data frame unique</>mark
(
**kwargs
)
(Callable) — Mark a class (e.g. Proc) with given kwargs as metadata</>pipen_banner
(
)
(RenderableType) — The banner for pipen</>strsplit
(
string
,sep
,maxsplit
,trim
)
(List) — Split the string, with the ability to trim each part.</>truncate_text
(
text
,width
,end
)
(str) — Truncate a text not based on words/whitespacesOtherwise, we could use textwrap.shorten. </>update_dict
(
parent
,new
,depth
)
(Mapping) — Update the new dict to the parent, but make sure parent does not change</>