complete
%run nb_helpers.py
from datar.all import *
nb_header(complete)
★ complete¶
Turns implicit missing values into explicit missing values.¶
Args:¶
data
: A data frame
*args
: columns to expand. Columns can be atomic lists.
- To find all unique combinations of x, y and z, including
those not present in the data, supply each variable as a
separate argument: expand(df, x, y, z)
.
- To find only the combinations that occur in the data, use
nesting
: expand(df, nesting(x, y, z))
.
- You can combine the two forms. For example,
expand(df, nesting(school_id, student_id), date)
would
produce a row for each present school-student combination
for all possible dates.
fill
: A named list that for each variable supplies a single value
to use instead of NA for missing combinations.
explict
: Should both implicit (newly created) and explicit
(pre-existing) missing values be filled by fill? By default,
this is TRUE, but if set to FALSE this will limit the fill to only
implicit missing values.
Returns:¶
Data frame with missing values completed
df = tibble(
group = c(c[1:2:1], 1),
item_id = c(c[1:2:1], 2),
item_name = c("a", "b", "b"),
value1 = c[1:3:1],
value2 = c[4:6:1]
)
df >> complete(f.group, nesting(f.item_id, f.item_name))
group | item_id | item_name | value1 | value2 | |
---|---|---|---|---|---|
<int64> | <int64> | <object> | <float64> | <float64> | |
0 | 1 | 1 | a | 1.0 | 4.0 |
1 | 1 | 2 | b | 3.0 | 6.0 |
2 | 2 | 1 | a | NaN | NaN |
3 | 2 | 2 | b | 2.0 | 5.0 |
df >> complete(f.group, nesting(f.item_id, f.item_name), fill=dict(value1=0))
group | item_id | item_name | value1 | value2 | |
---|---|---|---|---|---|
<int64> | <int64> | <object> | <float64> | <float64> | |
0 | 1 | 1 | a | 1.0 | 4.0 |
1 | 1 | 2 | b | 3.0 | 6.0 |
2 | 2 | 1 | a | 0.0 | NaN |
3 | 2 | 2 | b | 2.0 | 5.0 |