Skip to content

xqute.schedulers.slurm_scheduler

module

xqute.schedulers.slurm_scheduler

The scheduler to run jobs on Slurm

Classes
class

xqute.schedulers.slurm_scheduler.SlurmJob(index, cmd, metadir=PosixPath('.xqute'), error_retry=None, num_retries=None)

Slurm job

Parameters
  • index (int) The index of the job
  • cmd (str | List[str]) The command of the job
  • metadir (PathLike, optional) The meta directory of the Job
  • error_retry (Optional[bool], optional) Whether we should retry if error happened
  • num_retries (Optional[int], optional) Total number of retries
Attributes
  • CMD_WRAPPER_SHELL The shell to run the wrapped script
  • CMD_WRAPPER_TEMPLATE The template for job wrapping
  • _error_retry Whether we should retry if error happened
  • _num_retries Total number of retries
  • _rc The return code of the job
  • _status The status of the job
  • _wrapped_cmd The wrapped cmd, used for job submission
  • cmd The command
  • hook_done Mark whether hooks have already been. Since we don't havea trigger for job finished/failed, so we do a polling on it. This is to avoid calling the hooks repeatedly
  • index The index of the job
  • jid The jid of the job in scheduler system
  • jid (int | str | none) Get the jid of the job in scheduler system</>
  • jid_file (Path) The jid file of the job</>
  • metadir The metadir of the job
  • rc (int) The return code of the job</>
  • rc_file (Path) The rc file of the job</>
  • retry_dir (Path) The retry directory of the job</>
  • status (int) Query the status of the job
    If the job is submitted, try to query it from the status file Make sure the status is updated by trap in wrapped script </>
  • status_file (Path) The status file of the job</>
  • stderr_file (Path) The stderr file of the job</>
  • stdout_file (Path) The stdout file of the job</>
  • strcmd (str) Get the string representation of the command</>
  • trial_count The count for re-tries
Methods
  • __repr__() (str) repr of the job</>
  • clean(retry) Clean up the meta files</>
  • wrap_cmd(scheduler) (str) Wrap the command to enable status, returncode, cleaning whenjob exits </>
  • wrapped_script(scheduler) (PathLike) Get the wrapped script</>
method

__repr__() → str

repr of the job

method

clean(retry=False)

Clean up the meta files

Parameters
  • retry (optional) Whether clean it for retrying
method

wrapped_script(scheduler)

Get the wrapped script

Parameters
  • scheduler (Scheduler) The scheduler
Returns (PathLike)

The path of the wrapped script

method

wrap_cmd(scheduler)

Wrap the command to enable status, returncode, cleaning whenjob exits

Parameters
Returns (str)

The wrapped script

class

xqute.schedulers.slurm_scheduler.SlurmScheduler(*args, **kwargs)

The Slurm scheduler

Attributes
  • job_class The job class
  • name The name of the scheduler
Parameters
  • **kwargs Other arguments for the scheduler
Classes
Methods

Submit and update the status

  1. Check if the job is already submitted or running
  2. If not, run the hook
  3. If the hook is not cancelled, clean the job
  4. Submit the job, raising an exception if it fails
  5. If the job is submitted successfully, update the status
  6. If the job fails to submit, update the status and write stderr to the job file
Parameters
  • job (Job) The job
method

retry_job(job)

Retry a job

Parameters
  • job (Job) The job

Kill a job and update its status

Parameters
  • job (Job) The job
method

polling_jobs(jobs, on, halt_on_error)

Check if all jobs are done or new jobs can submit

Parameters
  • jobs (List) The list of jobs
  • on (str) query on status: can_submit or all_done
  • halt_on_error (bool) Whether we should halt the whole pipeline on error
Returns (bool)

True if yes otherwise False.

method

kill_running_jobs(jobs)

Try to kill all running jobs

Parameters
  • jobs (List) The list of jobs

Check if a job is already submitted or running

Parameters
  • job (Job) The job
Returns (bool)

True if yes otherwise False.

class

xqute.schedulers.slurm_scheduler.SlurmJob(index, cmd, metadir=PosixPath('.xqute'), error_retry=None, num_retries=None)

Slurm job

Attributes
  • jid (int | str | none) Get the jid of the job in scheduler system</>
  • jid_file (Path) The jid file of the job</>
  • rc (int) The return code of the job</>
  • rc_file (Path) The rc file of the job</>
  • retry_dir (Path) The retry directory of the job</>
  • status (int) Query the status of the job
    If the job is submitted, try to query it from the status file Make sure the status is updated by trap in wrapped script </>
  • status_file (Path) The status file of the job</>
  • stderr_file (Path) The stderr file of the job</>
  • stdout_file (Path) The stdout file of the job</>
  • strcmd (str) Get the string representation of the command</>
Methods
  • __repr__() (str) repr of the job</>
  • clean(retry) Clean up the meta files</>
  • wrap_cmd(scheduler) (str) Wrap the command to enable status, returncode, cleaning whenjob exits </>
  • wrapped_script(scheduler) (PathLike) Get the wrapped script</>
method
__repr__() → str

repr of the job

method
clean(retry=False)

Clean up the meta files

Parameters
  • retry (optional) Whether clean it for retrying
method
wrapped_script(scheduler)

Get the wrapped script

Parameters
  • scheduler (Scheduler) The scheduler
Returns (PathLike)

The path of the wrapped script

method
wrap_cmd(scheduler)

Wrap the command to enable status, returncode, cleaning whenjob exits

Parameters
Returns (str)

The wrapped script

method

submit_job(job)

Submit a job to Slurm

Parameters
  • job (Job) The job
Returns (str)

The job id

method

kill_job(job)

Kill a job on Slurm

Parameters
  • job (Job) The job
method

job_is_running(job)

Tell if a job is really running, not only the job.jid_file

In case where the jid file is not cleaned when job is done.

Parameters
  • job (Job) The job
Returns (bool)

True if it is, otherwise False