xqute.scheduler
xqute.scheduler.Scheduler(workdir, forks=1, error_strategy='ignore', num_retries=3, prescript='', postscript='', jobname_prefix=None, submission_batch=None, recheck_interval=600, cwd=None, **kwargs)
The abstract class for scheduler
jobcmd_wrapper_init— The init script for the job command wrapperjobcmd_wrapper_init(str) — The init script for the job command wrapper</>name— The name of the scheduler
workdir(str | pathlib.path | cloudpathlib.cloudpath.cloudpath) — The working directoryforks(int, optional) — Max number of job forkserror_strategy(str, optional) — The strategy when there is error happenednum_retries(int, optional) — Max number of retries when error_strategy is retryprescript(str, optional) — The prescript to run before the job commandIt is a piece of script that inserted into the wrapper script, running on the scheduler system.postscript(str, optional) — The postscript to run when job finishedIt is a piece of script that inserted into the wrapper script, running on the scheduler system.jobname_prefix(str | none, optional) — The prefix for the job namesubmission_batch(int | none, optional) — The number of consumers to submit jobs. This allowsmultiple jobs to be submitted in parallel. This is useful when there are many jobs to be submitted and the scheduler has a high latency for each submission. Set this to a smaller number if the scheduler cannot handle too many simultaneous submissions.recheck_interval(int, optional) — The number of polling iterations between rechecks ofwhether a job is still running on the scheduler. Helps detect jobs that fail before the wrapped script updates status (e.g., resource allocation failures). Each iteration takes ~0.1s, so default 600 means rechecking every ~60 seconds.cwd(str | pathlib.path, optional) — The working directory for the job command wrapper**kwargs— Other arguments for the scheduler
check_all_done(jobs,polling_counter)(bool) — Check if all jobs are done (full polling with hooks)</>count_running_jobs(jobs)(int) — Count currently running/active jobs (lightweight check)</>create_job(index,cmd,envs)(Job) — Create a job</>job_fails_before_running(job)(bool) — Check if a job fails before running.</>job_is_running(job)(bool) — Check if a job is really running</>job_is_submitted_or_running(job)(bool) — Check if a job is already submitted or running</>jobcmd_end(job)(str) — The job command end</>jobcmd_init(job)(str) — The job command init</>jobcmd_prep(job)(str) — The job command preparation</>jobcmd_shebang(job)(str) — The shebang of the wrapper script</>kill_job(job)— Kill a job</>kill_job_and_update_status(job)— Kill a job and update its status</>kill_running_jobs(jobs)— Try to kill all running jobs</>retry_job(job)— Retry a job</>submit_job(job)(int | str) — Submit a job</>submit_job_and_update_status(job)— Submit and update the status</>transition_job_status(job,new_status,rc,error_msg,is_killed)— Centralized status transition handler</>wrap_job_script(job)(str) — Wrap the job script</>wrapped_job_script(job)(SpecPath) — Get the wrapped job script</>
submit_job_and_update_status(job)
Submit and update the status
- Check if the job is already submitted or running
- If not, run the hook
- If the hook is not cancelled, clean the job
- Submit the job, raising an exception if it fails
- If the job is submitted successfully, update the status
- If the job fails to submit, update the status and write stderr to the job file
job(Job) — The job
transition_job_status(job, new_status, rc=None, error_msg=None, is_killed=False)
Centralized status transition handler
Handles all aspects of job status transitions:
- - Status change logging
- - Hook lifecycle management (ensuring on_job_started is called)
- - Appropriate hook calls based on new status
- - RC file updates
- - Error message appending to stderr
- - JID file cleanup for terminal states
- - Pipeline halt on errors if configured
job(Job) — The job to transitionnew_status(int) — The new status to transition torc(str | none, optional) — Optional return code to write to rc_fileerror_msg(str | none, optional) — Optional error message to append to stderr_fileis_killed(bool, optional) — Whether this is a killed job (uses on_job_killed hook)
count_running_jobs(jobs)
Count currently running/active jobs (lightweight check)
This is optimized for the producer to check if new jobs can be submitted. It only counts jobs without refreshing status or calling hooks.
jobs(List) — The list of jobs
Number of jobs currently in active states
check_all_done(jobs, polling_counter)
Check if all jobs are done (full polling with hooks)
This does complete status refresh and calls all lifecycle hooks. Used by the main polling loop to track job completion.
jobs(List) — The list of jobspolling_counter(int) — The polling counter for hook calls
True if all jobs are done, False otherwise
kill_running_jobs(jobs)
Try to kill all running jobs
jobs(List) — The list of jobs
job_fails_before_running(job)
Check if a job fails before running.
For some schedulers, the job might fail before running (after submission). For example, the job might fail to allocate resources. In such a case, the wrapped script might not be executed, and the job status will not be updated (stays in SUBMITTED). We need to check such jobs and mark them as FAILED.
For the instant scheduler, for example, the local scheduler, the failure will be immediately reported when submitting the job, so we don't need to check such jobs.
job(Job) — The job to check
True if the job fails before running, otherwise False.
jobcmd_shebang(job) → str
The shebang of the wrapper script
jobcmd_init(job) → str
The job command init
jobcmd_prep(job) → str
The job command preparation
jobcmd_end(job) → str
The job command end