Skip to content

Scheduler

pipen can send jobs to different scheduler system to run. To specify the scheduler, use scheduler and scheduler_opts configurations.

Default supported schedulers

pipen uses xqute for scheduler backend support. The following schedulers are supported by pipen:

local

This is the default scheduler used by pipen. The jobs will be run on the local machine.

No scheduler-specific options are available.

sge

Send the jobs to run on sge scheduler.

The scheduler_opts will be the ones supported by qsub.

slurm

Send the jobs to run on slurm scheduler.

The scheduler_opts will be the ones supported by sbatch.

ssh

Send the jobs to run on a remote machine via ssh.

The scheduler_opts will be the ones supported by ssh.

See also xqute.

container

Send the jobs to run in a container (Docker/Podman/Apptainer). The scheduler_opts will be used to construct the container command.

They include: - image: The container image to use. - entrypoint: The entrypoint of the container to run the wrapped job script. If not specified, the default entrypoint /bin/sh will be used. - bin: The container command to use. If not specified, it will use docker. - volumes: A list of volumes to mount to the container. The default volumes are: - workdir: The working directory of the pipeline, mounted to /mnt/disks/pipen-pipeline/workdir. - outdir: The output directory of the pipeline, mounted to /mnt/disks/pipen-pipeline/outdir. - envs: A dictionary of environment variables to set in the container. - remove: Whether to remove the container after the job is done. Default is True. Only supported by Docker and Podman. - user: The user to run the container as. Default is the current user. Only supported by Docker and Podman. - bin_args: Additional arguments to pass to the container command. For example, {"bin_args": ["--privileged"]} will run the container in privileged mode. Only supported by Docker and Podman.

gbatch

Send the jobs to run using Google Cloud Batch.

The scheduler_opts will be used to construct the job configuration. This scheduler requires that the pipeline's outdir is a Google Cloud Storage path (e.g., gs://bucket/path).

The scheduler options include: - project: Google Cloud project ID - location: Google Cloud region or zone - mount: GCS path to mount (e.g. gs://my-bucket:/mnt/my-bucket). You can pass a list of mounts. - service_account: GCP service account email (e.g. test-account@example.com) - network: GCP network (e.g. default-network) - subnetwork: GCP subnetwork (e.g. regions/us-central1/subnetworks/default) - no_external_ip_address: Whether to disable external IP address - machine_type: GCP machine type (e.g. e2-standard-4) - provisioning_model: GCP provisioning model (e.g. SPOT) - image_uri: Container image URI (e.g. ubuntu-2004-lts) - entrypoint: Container entrypoint (e.g. /bin/bash) - commands: The command list to run in the container. There are three ways to specify the commands: 1. If no entrypoint is specified, the final command will be [commands, wrapped_script], where the entrypoint is the wrapper script interpreter that is determined by JOBCMD_WRAPPER_LANG (e.g. /bin/bash), commands is the list you provided, and wrapped_script is the path to the wrapped job script. 2. You can specify something like "-c", then the final command will be ["-c", "wrapper_script_interpreter, wrapper_script"] 3. You can use the placeholders {lang} and {script} in the commands list, where {lang} will be replaced with the interpreter (e.g. /bin/bash) and {script} will be replaced with the path to the wrapped job script. For example, you can specify ["{lang} {script}"] and the final command will be ["wrapper_interpreter, wrapper_script"]

Additional keyword arguments can be used for job configuration (e.g. taskGroups). See more details at Google Cloud Batch documentation.

By default, the pipeline's workdir is mounted to /mnt/disks/pipen-pipeline/workdir and the outdir is mounted to /mnt/disks/pipen-pipeline/outdir on the VM.

Writing your own scheduler plugin

To write a scheduler plugin, you need to subclass both xqute.schedulers.scheduler.Scheduler and pipen.scheduler.SchedulerPostInit.

For examples of a scheduler plugin, see local_scheduler, sge_scheduler, slurm_scheduler, ssh_scheduler, and [gbatch_scheduler][6], and also pipen.scheduler.

A scheduler class can be passed to scheduler configuration directly to be used as a scheduler. But you can also register it with entry points:

For setup.py, you will need:

setup(
	# ...
	entry_points={"pipen_sched": ["mysched = pipen_mysched"]},
	# ...
)

For pyproject.toml:

[tool.poetry.plugins.pipen_sched]
mysched = "pipen_mysched"

Then you can switch the scheduler to mysched by scheduler="mysched"