Caching
Job caching
If cache
set to False
(detected in the sequence of configuration files, Pipen
constructor, and process definition), the job is running anyway regardless of previous runs.
If a previous run of a job fails, the job will be running anyway.
If a job is done successfully, a signature file will be generated for the job. When we try to run the job again, the signature will be used to check if we can skip running the job again but to use the results generated by previous run.
We can also do a force-cache for a job by setting cache
to "force"
. This make sure of the results of previous successful run regardless of input or script changes. This is useful for the cases that, for example, you make some changes to input/script, but you don't want them to take effect immediately, especially when the job takes long time to run.
Job signature
The signature of a job consists of input types and data, output types and data, and lastest time (lastest_time
) any files/directories from the script, input or output files are generated/modified. So these siutations will make job-cache checking fail (job will start over):
- Any changes in
input
oroutput
types - Any changes in
input
oroutput
data - Any changes to
script
- Any touches to input files (since they will make the last modified time >
lastest_time
) - Any touches to input directories
- Use
dirsig
as the depth to check the files under the directories - Otherwise if it is
0
, only the directories themselves are checked. Note that modify a file inside a directory may not change the last modified time of the directory itself. - Any deletions to the output files/directories
Note that only the files/directories specified by
output
are checked. Files or subdirectories in the output directories will NOT be checked.