Configurations
Configuration items
There are two levels of configuration items in pipen
: pipeline level and process level.
There are only 3 configuration items at pipeline level:
loglevel
: The logging level for the logger (Default:"info"
)workdir
: Where the metadata and intermediate files are saved for the pipeline (Default:./.pipen
)plugins
: The plugins to be enabled or disabled for the pipeline
These items cannot be set or changed at process level.
Following items are at process level. They can be set changed at process level so that they can be process-specific. You may also see some of the configuration items introduced here
cache
: Should we detect whether the jobs are cached? See also heredirsig
: When checking the signature for caching, whether should we walk through the content of the directory? This is sometimes time-consuming if the directory is big.error_strategy
: How to deal with the errors: retry, ignore or halt. See also herenum_retries
: How many times to retry to jobs once error occurs.template
: efine the template engine to use. See also heretemplate_opts
: Options to initialize the template engine (will inherit from pipeline level)forks
: How many jobs to run simultaneously?lang
: The language for the script to run. See also hereplugin_opts
: Options for process-level plugins, will inherit from pipeline levelscheduler
: The scheduler to run the jobsscheduler_opts
: The options for the scheduler, will inherit from pipeline levelsubmission_batch
: How many jobs to be submited simultaneously
Configuration priorities
There are different places to set values for the configuration items (priorities from low to high):
-
The configuration files (priorities from low to high):
-
~/.pipen.toml
./.pipen.toml
PIPEN.osenv
See here for how the configuration files are loaded.
pipen
uses TOML
as configuration language, see here for more information about toml
format.
- The arguments of
Pipen
constructor - The process definition
Note
The configurations from configuration files are with profiles. If the same profile name appears in multiple configuration files, the items will be inherited from the lower-priority files.
Note
Special note for lang
.
If it is not set at process level, and there are shebang in the script, whatever you specified at pipeline level (including in the configuration files), it will be ignored and the interpreter in the shebang will be used.
See also script
Tip
If you have nothing set at Pipen
constructor or process definition for a configuration item, the PIPEN.osenv
is useful to use a different value than the one set in other configuration files. For example, to disable cache for all processes:
PIPEN_DEFAULT_cache=0 python ./pipeline.py ...
Profiles
You can have different profiles in configuration files:
~/.pipen.toml
[default]
scheduler = "local"
[sge]
scheduler = "sge"
[sge.schduler_opts]
sge_q = "1-day"
To use the sge
profile:
Pipen().run(P1, profile="sge")
You can also have a configuration in current directory:
./.pipen.toml
[sge.scheduler_opts]
sge_q = "7-days"
Then the queue to run the jobs will be 7-days
. Note that we didn't specify the scheduler
in ./.pipen.toml
, which is inherited from ~/.pipen.toml
.