Input and output
Specify input of a process
The input of a process is specified with input
, the keys of the input data, and input_data
, the real input data
Tip
Why separate the keys and data?
Because the keys and data are not always combined, for example, we need the keys to infer the output
and script
(using them in the template), but the data may be deferred to obtain from the output of dependency processes.
The complete form of an input key (input
) is <key>:<type>
. The <type>
could be var
, file
, dir
, files
and dirs
. A type of var
can be omitted. So ph1, ph2
is the same as ph1:var, ph2:var
If a process is requiring other processes, the specified input_data
will be ignored, and will use the output data of their required processes:
class P1(Proc):
input = "v1"
output = "o1:{{in.v1}}" # pass by v1 as output variable
input_data = ["a"]
class P2(Proc):
input = "v2"
output = "o2:{{in.v2}}"
input_data = ["b"]
class P3(Proc):
requires = [P1, P2]
input = "i1, i2"
output = "o3:{{in.i1}}_{{in.i2}}" # will be "a_b"
# input_data = [] # ignored with a warning
Pipen().run(P1, P2)
Tip
The direct input_data
is ignore, but you can use a callback to modify the input channel.
For example:
class P4(Proc):
requires = [P1, P2]
input = "i1, i2"
input_data = lambda ch: ch.applymap(str.upper)
output = "o3:{{in.i1}}_{{in.i2}}" # will be "A_B"
Note
When the input data does have enough columns, None
will be used with warnings. And when the input data has more columns than the input keys, the extra columns are dropped and ignored, also with warnings
Specify output of a process
Different from input, instead of channels, you have to tell pipen
how to compute the output channel. The output can be a list
or str
. If it's str
, a comma (,
) is used to separate different keys:
To use templating in output
, see templating
.
class P1(Proc):
input = "invar, infile"
input_data = [(1, "/a/b/c.txt")]
output = (
"outvar:{{in.invar}}2, "
"outfile:file:{{in.infile.split('/')[-1]}}2, "
"outdir:dir:{{in.infile.split('/')[-1].split('.')[0]}}-dir"
)
# The type 'var' is omitted in the first element.
# The output channel will be:
#
# outvar outfile outdir
# <object> <object> <object>
# 0 "12" "<job.outdir>/c.text2" "<job.outdir>/c-dir"
Types of input and output
Input
Type | Meaning |
---|---|
var |
Use the values directly |
file |
Treat the data as a file path |
dir |
Treat the data as a directory path |
files |
Treat the data as a list of file paths |
dirs |
Treat the data as a list of directory paths |
For file
/files
, when checking whether a job is cached, their last modified time will be checked.
For dir
/dirs
, if dirsig > 0
, then the files inside the directories will be checked. Otherwise, the directories themselves are checked for last modified time.
Output
Type | Meaning | Memo |
---|---|---|
var |
Use the values directly | |
dir |
Use the data as a directory path | The directory will be created directly |
file |
Use the data as a file path |