Skip to content

Caching

Job caching

If cache set to False (detected in the sequence of configuration files, Pipen constructor, and process definition), the job is running anyway regardless of previous runs.

If a previous run of a job fails, the job will be running anyway.

If a job is done successfully, a signature file will be generated for the job. When we try to run the job again, the signature will be used to check if we can skip running the job again but to use the results generated by previous run.

We can also do a force-cache for a job by setting cache to "force". This make sure of the results of previous successful run regardless of input or script changes. This is useful for the cases that, for example, you make some changes to input/script, but you don't want them to take effect immediately, especially when the job takes long time to run.

Job signature

The signature of a job consists of input types and data, output types and data, and lastest time (lastest_time) any files/directories from the script, input or output files are generated/modified. So these siutations will make job-cache checking fail (job will start over):

  1. Any changes in input or output types
  2. Any changes in input or output data
  3. Any changes to script
  4. Any touches to input files (since they will make the last modified time > lastest_time)
  5. Any touches to input directories
  6. Use dirsig as the depth to check the files under the directories
  7. Otherwise if it is 0, only the directories themselves are checked. Note that modify a file inside a directory may not change the last modified time of the directory itself.
  8. Any deletions to the output files/directories Note that only the files/directories specified by output are checked. Files or subdirectories in the output directories will NOT be checked.