Piping
Piping
How it works to detect piping
data %>% verb(arg1, ..., key1=kwarg1, ...)
The above is a typical dplyr/tidyr data piping syntax.
The counterpart python syntax we expect is:
data >> verb(arg1, ..., key1=kwarg1, ...)
To implement that, we need to defer the execution of the verb by turning it into a Verb object, which holds all information of the function to be executed later. The Verb object won't be executed until the data is piped in. It all thanks to the executing package to let us determine the ast nodes where the function is called. So that we are able to determine whether the function is called in a piping mode.
If an argument is referring to a column of the data and the column will be involved in the later computation, the it also needs to be deferred. For example, with dplyr in R:
data %>% mutate(z=a)
is trying add a column named z with the data from column a.
In python, we want to do the same with:
data >> mutate(z=f.a)
where f.a is a Reference object that carries the column information without fetching the data while python sees it immmediately.
Here the trick is f. Like other packages, we introduced the Symbolic object, which will connect the parts in the argument and make the whole argument an Expression object. This object is holding the execution information, which we could use later when the piping is detected.
Fallbacks when AST node detection fails
pipda
detects the AST node for the verb calling. If it is next to a piping
operator (defaults to >>
, could be changed by register_piping()
), then it
is compiled into a VerbCall
object, awaiting data to pipe in to evalute. We
call this the piping
mode. Otherwise, it is treated a as normal function
call, where the data should be passed directly. This is the normal
mode.
However, the AST node is not always available. pipda
relies on
executing
to detect the node. There are situations AST nodes can not be
detected. One of the biggest reasons is that the source code is not
avaiable/compromised at runtime. For example, pytest
's assert statement,
raw python REPL, etc.
We can set up a fallback mode when we fail to determine the AST node.
piping
: fallback topiping
mode if AST node not avaiablenormal
: fallback tonormal
node if AST node not avaiablepiping_warning
: fallback topiping
mode if AST node not avaiable and given a warningnormal_warning
(default): fallback tonormal
mode if AST node not avaiable and given a warningraise
: Raise an error
We can also pass one of the above values to __ast_fallback
when we call the verb.
@register_verb(int, ast_fallback="normal")
def add(x, y):
return x + y
@register_verb(int, ast_fallback="piping")
def sub(x, y):
return x - y
@register_verb(int)
def mul(x, y):
return x * y
# In an environment AST node cannot be detected
add(1, 2) # 3, ok
1 >> add(2) # TypeError, argument y missing
2 >> sub(1) # 1, ok
sub(2, 1) # TypeError, argument y missing
mul(1, 2, __ast_fallback="normal") # 3
1 >> mul(2, __ast_fallback="piping") # 3
# Change the fallback
add.ast_fallback = "piping"
1 >> add(2) # 3, ok
add(1, 2) # VerbCall object
Using a different operator for piping
By default, >>
is used for piping. We can also use other operators, including
">>", "|", "//", "@", "%", "&" and "^".
from pipda import register_piping, register_verb
register_piping("|")
@register_verb(int)
def add(x, y):
return x + y
1 | add(2) # 3