Configurations

Configuration items

There are two levels of configuration items in pipen: pipeline level and process level.

There are only 3 configuration items at pipeline level:

loglevel: The logging level for the logger (Default: "info")
workdir: Where the metadata and intermediate files are saved for the pipeline (Default: ./.pipen)
plugins: The plugins to be enabled or disabled for the pipeline

These items cannot be set or changed at process level.

Following items are at process level. They can be set changed at process level so that they can be process-specific. You may also see some of the configuration items introduced here

cache: Should we detect whether the jobs are cached? See also here
dirsig: When checking the signature for caching, whether should we walk through the content of the directory? This is sometimes time-consuming if the directory is big.
error_strategy: How to deal with the errors: retry, ignore or halt. See also here
num_retries: How many times to retry to jobs once error occurs.
template: efine the template engine to use. See also here
template_opts: Options to initialize the template engine (will inherit from pipeline level)
forks: How many jobs to run simultaneously?
lang: The language for the script to run. See also here
plugin_opts: Options for process-level plugins, will inherit from pipeline level
scheduler: The scheduler to run the jobs
scheduler_opts: The options for the scheduler, will inherit from pipeline level
submission_batch: How many jobs to be submited simultaneously

Configuration priorities

There are different places to set values for the configuration items (priorities from low to high):

The configuration files (priorities from low to high):
~/.pipen.toml
./.pipen.toml
PIPEN.osenv

See here for how the configuration files are loaded. pipen uses TOML as configuration language, see here for more information about toml format.

The arguments of Pipen constructor
The process definition

Note

The configurations from configuration files are with profiles. If the same profile name appears in multiple configuration files, the items will be inherited from the lower-priority files.

Note

Special note for lang.

If it is not set at process level, and there are shebang in the script, whatever you specified at pipeline level (including in the configuration files), it will be ignored and the interpreter in the shebang will be used.

Profiles

You can have different profiles in configuration files:

~/.pipen.toml

[default]
scheduler = "local"

[sge]
scheduler = "sge"

[sge.schduler_opts]
sge_q = "1-day"

To use the sge profile:

Pipen().run(P1, profile="sge")

You can also have a configuration in current directory:

./.pipen.toml

[sge.scheduler_opts]
sge_q = "7-days"

Then the queue to run the jobs will be 7-days. Note that we didn't specify the scheduler in ./.pipen.toml, which is inherited from ~/.pipen.toml.