Templating
Templates are used in output
and script
in process definition.
Template engines
By default, pipen
uses liquid
template engine to render the output
and script
. You can also switch the template engine to jinja2
by specifying:
template = "jinja2"
in one of the configuration files, or in the Pipen
constructor:
pipeline = Pipen(..., template="jinja2", ...)
or in the process definition
class MyProcess(Proc):
...
template = "jinja2" # overwrite the global template engine
Besides specifying the name of a template engine, you can also specify a subclass pipen.template.Template
as a template engine. This enables us to use our own template engine. You just have to wrap then use a subclass of pipen.template.Template
. For example, if you want to use mako
:
from mako.template import Template as MakoTemplate
from pipen.template import Template
class TemplateMako(Template):
def __init__(self, source, **kwargs):
super().__init__(source)
self.engine = MakoTemplate(source, **kwargs)
def _render(self, data):
return self.engine.render(**data)
# Use it for a process
from pipen import Proc
class MyProcess(Proc):
template = TemplateMako
... # other configurations
The template_opts
configuration is used to pass to TemplateMako
constructor. The values is passed by to the MakoTemplate
constructor.
You can also register the template as a plugin of pipen:
In pyproject.toml
:
[tool.poetry.plugins.pipen_tpl]
mako = "pipen_mako:pipen_mako"
Or in setup.py
:
setup(
...,
entry_points={"pipen_tpl": ["pipen_mako:pipen_mako"]},
)
Then in pipen_mako.py
of your package:
def pipen_mako():
# TemplateMako is defined as the above
return TemplateMako
Rendering data
There are some data shared to render both output
and script
. However, there are some different. One of the obvious reasons is that, the script
template can use the output
data to render.
output
The data to render the output
:
Name | Description |
---|---|
job.index |
The index of the job, 0-based |
job.metadir |
The directory where job metadata is saved, typically <pipeline-workdir>/<pipeline-name>/<proc-name>/<job.index>/ |
job.outdir |
*The output directory of the job: <pipeline-workdir>/<pipeline-name>/<proc-name>/<job.index>/output |
job.stdout_file |
The file that saves the stdout of the job |
job.stderr_file |
The file that saves the stderr of the job |
job.lock_file |
The file lock of the job, prevent the same job to run simultaneously so that they are "thread-safe" |
in |
The input data of the job. You can use in.<input-key> to access the data for each input key |
proc |
The process object, used to access their properties, such as proc.workdir |
envs |
The envs of the process |
*
: If the process is an end process, it will be a symbolic link to <pipeline-outdir>/<process-name>/<job.index>
. When the process has only a single job, the <job.index>
is also omitted.
script
All the data used to render output
can also be used to render script
. Addtionally, the rendered output
can also be used to render script
. For example:
class MyProcess(Proc):
input = "in"
output = "outfile:file:{{in.in}}.txt"
script = "echo {{in.in}} > {{out.outfile}}"
... # other configurations
With input data ["a"], the script is rendered as echo a > <job.outdir>/a.txt