complete
%run nb_helpers.py
from datar.all import *
nb_header(complete)
★ complete¶
Turns implicit missing values into explicit missing values.¶
Args:¶
data: A data frame
*args: columns to expand. Columns can be atomic lists.
- To find all unique combinations of x, y and z, including
those not present in the data, supply each variable as a
separate argument: expand(df, x, y, z).
- To find only the combinations that occur in the data, use
nesting: expand(df, nesting(x, y, z)).
- You can combine the two forms. For example,
expand(df, nesting(school_id, student_id), date) would
produce a row for each present school-student combination
for all possible dates.
fill: A named list that for each variable supplies a single value
to use instead of NA for missing combinations.
explict: Should both implicit (newly created) and explicit
(pre-existing) missing values be filled by fill? By default,
this is TRUE, but if set to FALSE this will limit the fill to only
implicit missing values.
Returns:¶
Data frame with missing values completed
df = tibble(
group = c(c[1:2:1], 1),
item_id = c(c[1:2:1], 2),
item_name = c("a", "b", "b"),
value1 = c[1:3:1],
value2 = c[4:6:1]
)
df >> complete(f.group, nesting(f.item_id, f.item_name))
| group | item_id | item_name | value1 | value2 | |
|---|---|---|---|---|---|
| <int64> | <int64> | <object> | <float64> | <float64> | |
| 0 | 1 | 1 | a | 1.0 | 4.0 |
| 1 | 1 | 2 | b | 3.0 | 6.0 |
| 2 | 2 | 1 | a | NaN | NaN |
| 3 | 2 | 2 | b | 2.0 | 5.0 |
df >> complete(f.group, nesting(f.item_id, f.item_name), fill=dict(value1=0))
| group | item_id | item_name | value1 | value2 | |
|---|---|---|---|---|---|
| <int64> | <int64> | <object> | <float64> | <float64> | |
| 0 | 1 | 1 | a | 1.0 | 4.0 |
| 1 | 1 | 2 | b | 3.0 | 6.0 |
| 2 | 2 | 1 | a | 0.0 | NaN |
| 3 | 2 | 2 | b | 2.0 | 5.0 |