Templating

Templates are used in output and script in process definition.

Template engines

By default, pipen uses liquid template engine to render the output and script. You can also switch the template engine to jinja2 by specifying:

template = "jinja2"

in one of the configuration files, or in the Pipen constructor:

pipeline = Pipen(..., template="jinja2", ...)

or in the process definition

class MyProcess(Proc):
    ...
    template = "jinja2" # overwrite the global template engine

Besides specifying the name of a template engine, you can also specify a subclass pipen.template.Template as a template engine. This enables us to use our own template engine. You just have to wrap then use a subclass of pipen.template.Template. For example, if you want to use mako:

from mako.template import Template as MakoTemplate
from pipen.template import Template

class TemplateMako(Template):

    def __init__(self, source, **kwargs):
        super().__init__(source)
        self.engine = MakoTemplate(source, **kwargs)

    def _render(self, data):
        return self.engine.render(**data)

# Use it for a process
from pipen import Proc

class MyProcess(Proc):
    template = TemplateMako
    ... # other configurations

The template_opts configuration is used to pass to TemplateMako constructor. The values is passed by to the MakoTemplate constructor.

You can also register the template as a plugin of pipen:

In pyproject.toml:

[tool.poetry.plugins.pipen_tpl]
mako = "pipen_mako:pipen_mako"

Or in setup.py:

setup(
    ...,
    entry_points={"pipen_tpl": ["pipen_mako:pipen_mako"]},
)

Then in pipen_mako.py of your package:

def pipen_mako():
    # TemplateMako is defined as the above
    return TemplateMako

Rendering data

There are some data shared to render both output and script. However, there are some different. One of the obvious reasons is that, the script template can use the output data to render.

`output`

The data to render the output:

Name	Description
`job.index`	The index of the job, 0-based
`job.metadir`¹	The directory where job metadata is saved, typically `<pipeline-workdir>/<pipeline-name>/<proc-name>/<job.index>/`
`job.outdir`¹	*The output directory of the job: `<pipeline-workdir>/<pipeline-name>/<proc-name>/<job.index>/output`
`job.stdout_file`¹	The file that saves the stdout of the job
`job.stderr_file`¹	The file that saves the stderr of the job
`in`	The input data of the job. You can use `in.<input-key>`¹ to access the data for each input key
`proc`	The process object, used to access their properties, such as `proc.workdir`
`envs`	The `envs` of the process

*: If the process is an end process, it will be a symbolic link to <pipeline-outdir>/<process-name>/<job.index>. When the process has only a single job, the <job.index> is also omitted.

`script`

All the data used to render output can also be used to render script. Addtionally, the rendered output can also be used to render script. For example:

class MyProcess(Proc):
    input = "in"
    output = "outfile:file:{{in.in}}.txt"
    script = "echo {{in.in}} > {{out.outfile}}"
    ... # other configurations

With input data ["a"], the script is rendered as echo a > <job.outdir>/a.txt

¹ The paths are MountedPath objects, which represent paths of jobs and it is useful when a job is running in a remote system (a VM, a container, etc.), where we need to mount the paths into the remote system. It has an attribute spec to get the specified path. When there is no mountings, it is the same as the path itself.