datar.apis.tidyr
datar.apis.tidyr
chop
(
data
,cols
)
(Any) — Makes data frame shorter by converting rows within each groupinto list-columns. </>complete
(
data
,*args
,fill
,explict
)
(Any) — Turns implicit missing values into explicit missing values.</>crossing
(
*args
,_name_repair
,**kwargs
)
(Any) — A wrapper aroundexpand_grid()
that de-duplicates and sorts its inputs</>drop_na
(
_data
,*columns
,_how
)
(Any) — Drop rows containing missing values</>expand
(
data
,*args
,_name_repair
,**kwargs
)
(Any) — Generates all combination of variables found in a dataset.</>extract
(
data
,col
,into
,regex
,remove
,convert
)
(Any) — Given a regular expression with capturing groups, extract() turns eachgroup into a new column. If the groups don't match, or the input is NA, the output will be NA. </>fill
(
_data
,*columns
,_direction
)
(Any) — Fills missing values in selected columns using the next orprevious entry. </>full_seq
(
x
,period
,tol
)
(Any) — Create the full sequence of values in a vector</>nest
(
_data
,_names_sep
,**cols
)
(Any) — Nesting creates a list-column of data frames</>nesting
(
*args
,_name_repair
,**kwargs
)
(Any) — A helper that only finds combinations already present in the data.</>pack
(
_data
,_names_sep
,**cols
)
(Any) — Makes df narrow by collapsing a set of columns into a single df-column.</>pivot_longer
(
_data
,cols
,names_to
,names_prefix
,names_sep
,names_pattern
,names_dtypes
,names_transform
,names_repair
,values_to
,values_drop_na
,values_dtypes
,values_transform
)
(Any) — "lengthens" data, increasing the number of rows anddecreasing the number of columns. </>pivot_wider
(
_data
,id_cols
,names_from
,names_prefix
,names_sep
,names_glue
,names_sort
,values_from
,values_fill
,values_fn
)
(Any) — "widens" data, increasing the number of columns and decreasingthe number of rows. </>replace_na
(
data
,data_or_replace
,replace
)
(Any) — Replace NA with a value</>separate
(
data
,col
,into
,sep
,remove
,convert
,extra
,fill
)
(Any) — Given either a regular expression or a vector of character positions,turns a single character column into multiple columns. </>separate_rows
(
data
,*columns
,sep
,convert
)
(Any) — Separates the values and places each one in its own row.</>unchop
(
data
,cols
,keep_empty
,dtypes
)
(Any) — Makes df longer by expanding list-columns so that each elementof the list-column gets its own row in the output. </>uncount
(
data
,weights
,_remove
,_id
)
(Any) — Duplicating rows according to a weighting variable</>unite
(
data
,col
,*columns
,sep
,remove
,na_rm
)
(Any) — Unite multiple columns into one by pasting strings together</>unnest
(
data
,*cols
,keep_empty
,dtypes
,names_sep
,names_repair
)
(Any) — Flattens list-column of data frames back out into regular columns.</>unpack
(
data
,cols
,names_sep
,names_repair
)
(Any) — Makes df wider by expanding df-columns back out into individual columns.</>
datar.apis.tidyr.
full_seq
(
x
, period
, tol=1e-06
)
Create the full sequence of values in a vector
x
— A numeric vector.period
— Gap between each observation. The existing data will bechecked to ensure that it is actually of this periodicity.tol
(optional) — Numerical tolerance for checking periodicity.
The full sequence
datar.apis.tidyr.
chop
(
data
, cols=None
)
Makes data frame shorter by converting rows within each groupinto list-columns.
data
— A data framecols
(optional) — Columns to chop
Data frame with selected columns chopped
datar.apis.tidyr.
unchop
(
data
, cols=None
, keep_empty=False
, dtypes=None
)
Makes df longer by expanding list-columns so that each elementof the list-column gets its own row in the output.
See https://tidyr.tidyverse.org/reference/chop.html
Recycling size-1 elements might be different from tidyr
>>> df = tibble(x=[1, [2,3]], y=[[2,3], 1])
>>> df >> unchop([f.x, f.y])
>>> # tibble(x=[1,2,3], y=[2,3,1])
>>> # instead of following in tidyr
>>> # tibble(x=[1,1,2,3], y=[2,3,1,1])
data
— A data frame.cols
(optional) — Columns to unchop.keep_empty
(bool, optional) — By default, you get one row of output for each elementof the list your unchopping/unnesting. This means that if there's a size-0 element (like NULL or an empty data frame), that entire row will be dropped from the output. If you want to preserve all rows, usekeep_empty
=True
to replace size-0 elements with a single row of missing values.dtypes
(optional) — Providing the dtypes for the output columns.Could be a single dtype, which will be applied to all columns, or a dictionary of dtypes with keys for the columns and values the dtypes. For nested data frames, we need to specifycol$a
as key. Ifcol
is used as key, all columns of the nested data frames will be casted into that dtype.
A data frame with selected columns unchopped.
datar.apis.tidyr.
nest
(
_data
, _names_sep=None
, **cols
)
Nesting creates a list-column of data frames
_data
— A data frame_names_sep
(str, optional) — IfNone
, the default, the names will be left as is.Inner names will come from the former outer names If a string, the inner and outer names will be used together. The names of the new outer columns will be formed by pasting together the outer and the inner column names, separated by_names_sep
.**cols
(str | int) — Columns to nest
Nested data frame.
datar.apis.tidyr.
unnest
(
data
, *cols
, keep_empty=False
, dtypes=None
, names_sep=None
, names_repair='check_unique'
)
Flattens list-column of data frames back out into regular columns.
data
— A data frame to flatten.*cols
(str | int) — Columns to unnest.keep_empty
(bool, optional) — By default, you get one row of output for each elementof the list your unchopping/unnesting. This means that if there's a size-0 element (like NULL or an empty data frame), that entire row will be dropped from the output. If you want to preserve all rows, usekeep_empty
=True
to replace size-0 elements with a single row of missing values.dtypes
(optional) — Providing the dtypes for the output columns.Could be a single dtype, which will be applied to all columns, or a dictionary of dtypes with keys for the columns and values the dtypes.names_sep
(str, optional) — IfNone
, the default, the names will be left as is.Inner names will come from the former outer names If a string, the inner and outer names will be used together. The names of the new outer columns will be formed by pasting together the outer and the inner column names, separated bynames_sep
.names_repair
(Union, optional) — treatment of problematic column names:- - "minimal": No name repair or checks, beyond basic existence,
- - "unique": Make sure names are unique and not empty,
- - "check_unique": (default value), no name repair,
but check they are unique, - - "universal": Make the names unique and syntactic
- - a function: apply custom name repair
Data frame with selected columns unnested.
datar.apis.tidyr.
pack
(
_data
, _names_sep=None
, **cols
)
→ Any
Makes df narrow by collapsing a set of columns into a single df-column.
_data
— A data frame_names_sep
(str, optional) — IfNone
, the default, the names will be left as is.Inner names will come from the former outer names If a string, the inner and outer names will be used together. The names of the new outer columns will be formed by pasting together the outer and the inner column names, separated by_names_sep
.**cols
(str | int) — Columns to pack
datar.apis.tidyr.
unpack
(
data
, cols
, names_sep=None
, names_repair='check_unique'
)
Makes df wider by expanding df-columns back out into individual columns.
For empty columns, the column is kept asis, instead of removing it.
data
— A data framecols
— Columns to unpacknames_sep
(str, optional) — IfNone
, the default, the names will be left as is.Inner names will come from the former outer names If a string, the inner and outer names will be used together. The names of the new outer columns will be formed by pasting together the outer and the inner column names, separated by_names_sep
.name_repair
— treatment of problematic column names:- - "minimal": No name repair or checks, beyond basic existence,
- - "unique": Make sure names are unique and not empty,
- - "check_unique": (default value), no name repair,
but check they are unique, - - "universal": Make the names unique and syntactic
- - a function: apply custom name repair
Data frame with given columns unpacked.
datar.apis.tidyr.
expand
(
data
, *args
, _name_repair='check_unique'
, **kwargs
)
Generates all combination of variables found in a dataset.
data
— A data frame*args
— and,_name_repair
(Union, optional) — treatment of problematic column names:- - "minimal": No name repair or checks, beyond basic existence,
- - "unique": Make sure names are unique and not empty,
- - "check_unique": (default value), no name repair,
but check they are unique, - - "universal": Make the names unique and syntactic
- - a function: apply custom name repair
**kwargs
— columns to expand. Columns can be atomic lists.- - To find all unique combinations of x, y and z, including
those not present in the data, supply each variable as a
separate argument:expand(df, x, y, z)
. - - To find only the combinations that occur in the data, use
nesting:expand(df, nesting(x, y, z))
. - - You can combine the two forms. For example,
expand(df, nesting(school_id, student_id), date)
would
produce a row for each present school-student combination
for all possible dates.
- - To find all unique combinations of x, y and z, including
A data frame with all combination of variables.
datar.apis.tidyr.
nesting
(
*args
, _name_repair='check_unique'
, **kwargs
)
A helper that only finds combinations already present in the data.
*args
— and,_name_repair
(Union, optional) — treatment of problematic column names:- - "minimal": No name repair or checks, beyond basic existence,
- - "unique": Make sure names are unique and not empty,
- - "check_unique": (default value), no name repair,
but check they are unique, - - "universal": Make the names unique and syntactic
- - a function: apply custom name repair
**kwargs
— columns to expand. Columns can be atomic lists.- - To find all unique combinations of x, y and z, including
those not present in the data, supply each variable as a
separate argument:expand(df, x, y, z)
. - - To find only the combinations that occur in the data, use
nesting:expand(df, nesting(x, y, z))
. - - You can combine the two forms. For example,
expand(df, nesting(school_id, student_id), date)
would
produce a row for each present school-student combination
for all possible dates.
- - To find all unique combinations of x, y and z, including
A data frame with all combinations in data.
datar.apis.tidyr.
crossing
(
*args
, _name_repair='check_unique'
, **kwargs
)
A wrapper around expand_grid()
that de-duplicates and sorts its inputs
When values are not specified by literal list
, they will be sorted.
*args
— and,_name_repair
(Union, optional) — treatment of problematic column names:- - "minimal": No name repair or checks, beyond basic existence,
- - "unique": Make sure names are unique and not empty,
- - "check_unique": (default value), no name repair,
but check they are unique, - - "universal": Make the names unique and syntactic
- - a function: apply custom name repair
**kwargs
— columns to expand. Columns can be atomic lists.- - To find all unique combinations of x, y and z, including
those not present in the data, supply each variable as a
separate argument:expand(df, x, y, z)
. - - To find only the combinations that occur in the data, use
nesting:expand(df, nesting(x, y, z))
. - - You can combine the two forms. For example,
expand(df, nesting(school_id, student_id), date)
would
produce a row for each present school-student combination
for all possible dates.
- - To find all unique combinations of x, y and z, including
A data frame with values deduplicated and sorted.
datar.apis.tidyr.
complete
(
data
, *args
, fill=None
, explict=True
)
Turns implicit missing values into explicit missing values.
data
— A data frame*args
— columns to expand. Columns can be atomic lists.- - To find all unique combinations of x, y and z, including
those not present in the data, supply each variable as a
separate argument:expand(df, x, y, z)
. - - To find only the combinations that occur in the data, use
nesting:expand(df, nesting(x, y, z))
. - - You can combine the two forms. For example,
expand(df, nesting(school_id, student_id), date)
would
produce a row for each present school-student combination
for all possible dates.
- - To find all unique combinations of x, y and z, including
fill
(optional) — A named list that for each variable supplies a single valueto use instead of NA for missing combinations.explict
(bool, optional) — Should both implicit (newly created) and explicit(pre-existing) missing values be filled by fill? By default, this is TRUE, but if set to FALSE this will limit the fill to only implicit missing values.
Data frame with missing values completed
datar.apis.tidyr.
drop_na
(
_data
, *columns
, _how='any'
)
Drop rows containing missing values
See https://tidyr.tidyverse.org/reference/drop_na.html
*columns
(str) — Columns to inspect for missing values._how
(str, optional) — How to select the rows to drop- - all: All columns of
columns
to beNA
s - - any: Any columns of
columns
to beNA
s
- - all: All columns of
data
— A data frame.
Dataframe with rows with NAs dropped and indexes dropped
datar.apis.tidyr.
extract
(
data
, col
, into
, regex='(\\w+)'
, remove=True
, convert=False
)
Given a regular expression with capturing groups, extract() turns eachgroup into a new column. If the groups don't match, or the input is NA, the output will be NA.
See https://tidyr.tidyverse.org/reference/extract.html
data
— The dataframecol
(str | int) — Column name or position.into
— Names of new variables to create as character vector.Use None to omit the variable in the output.regex
(str, optional) — a regular expression used to extract the desired values.There should be one group (defined by ()) for each element of into.remove
(bool, optional) — If TRUE, remove input column from output data frame.convert
(optional) — The universal type for the extracted columns or a dict forindividual ones
Dataframe with extracted columns.
datar.apis.tidyr.
fill
(
_data
, *columns
, _direction='down'
)
Fills missing values in selected columns using the next orprevious entry.
See https://tidyr.tidyverse.org/reference/fill.html
_data
— A dataframe*columns
(str | int) — Columns to fill_direction
(str, optional) — Direction in which to fill missing values.Currently either "down" (the default), "up", "downup" (i.e. first down and then up) or "updown" (first up and then down).
The dataframe with NAs being replaced.
datar.apis.tidyr.
pivot_longer
(
_data
, cols
, names_to='name'
, names_prefix=None
, names_sep=None
, names_pattern=None
, names_dtypes=None
, names_transform=None
, names_repair='check_unique'
, values_to='value'
, values_drop_na=False
, values_dtypes=None
, values_transform=None
)
"lengthens" data, increasing the number of rows anddecreasing the number of columns.
The row order is a bit different from tidyr
and pandas.DataFrame.melt
.
>>> df = tibble(x=c[1:2], y=c[3:4])
>>> pivot_longer(df, f[f.x:f.y])
>>> # name value
>>> # 0 x 1
>>> # 1 x 2
>>> # 2 y 3
>>> # 3 y 4
with `tidyr::pivot_longer`, the output will be:
>>> # # A tibble: 4 x 2
>>> # name value
>>> # <chr> <int>
>>> # 1 x 1
>>> # 2 y 3
>>> # 3 x 2
>>> # 4 y 4
_data
— A data frame to pivot.cols
— Columns to pivot into longer format.names_to
(optional) — A string specifying the name of the column to create fromthe data stored in the column names of data. Can be a character vector, creating multiple columns, if names_sep or names_pattern is provided. In this case, there are two special values you can take advantage of:- -
None
/NA
/NULL
will discard that component of the name. - -
.value
/_value
indicates that component of the name defines
the name of the column containing the cell values,
overriding values_to. - - Different as
tidyr
: With.value
/_value
, if there are other
parts of the names to distinguish the groups, they must be
captured. For example, user'(\w)_(\d)'
to match'a_1'
and
['.value', NA]
to discard the suffix, instead of use
r'(\w)_\d'
to match.
- -
names_prefix
(str, optional) — A regular expression used to remove matching text fromthe start of each variable name.names_sep
(str, optional) — andnames_pattern
(str, optional) — takes the same specification as extract(),a regular expression containing matching groups (()).names_dtypes
(optional) — andnames_transform
(Union, optional) — andnames_repair
(optional) — treatment of problematic column names:- - "minimal": No name repair or checks, beyond basic existence,
- - "unique": Make sure names are unique and not empty,
- - "check_unique": (default value), no name repair,
but check they are unique, - - "universal": Make the names unique and syntactic
- - a function: apply custom name repair
values_to
(str, optional) — A string specifying the name of the column to create fromthe data stored in cell values. If names_to is a character containing the special.value
/_value
sentinel, this value will be ignored, and the name of the value column will be derived from part of the existing column names.values_drop_na
(bool, optional) — If TRUE, will drop rows that contain only NAs inthe value_to column. This effectively converts explicit missing values to implicit missing values, and should generally be used only when missing values in data were created by its structure.values_dtypes
(optional) — A list of column name-prototype pairs.A prototype (or dtypes for short) is a zero-length vector (like integer() or numeric()) that defines the type, class, and attributes of a vector. Use these arguments if you want to confirm that the created columns are the types that you expect. Note that if you want to change (instead of confirm) the types of specific columns, you should use names_transform or values_transform instead.values_transform
(Union, optional) — A list of column name-function pairs.Use these arguments if you need to change the types of specific columns. For example, names_transform = dict(week = as.integer) would convert a character variable called week to an integer. If not specified, the type of the columns generated from names_to will be character, and the type of the variables generated from values_to will be the common type of the input columns used to generate them.
The pivoted dataframe.
datar.apis.tidyr.
pivot_wider
(
_data
, id_cols=None
, names_from='name'
, names_prefix=''
, names_sep='_'
, names_glue=None
, names_sort=False
, values_from='value'
, values_fill=None
, values_fn=None
)
"widens" data, increasing the number of columns and decreasingthe number of rows.
_data
— A data frame to pivot.id_cols
(optional) — A set of columns that uniquely identifies each observation.Defaults to all columns in data except for the columns specified in names_from and values_from.names_from
(optional) — andnames_prefix
(str, optional) — String added to the start of every variable name.names_sep
(str, optional) — If names_from or values_from contains multiple variables,this will be used to join their values together into a single string to use as a column name.names_glue
(str, optional) — Instead of names_sep and names_prefix, you can supplya glue specification that uses the names_from columns (and special _value) to create custom column names.names_sort
(bool, optional) — Should the column names be sorted? If FALSE, the default,column names are ordered by first appearance.values_from
(optional) — A pair of arguments describing which column(or columns) to get the name of the output column (names_from), and which column (or columns) to get the cell values from (values_from).values_fill
(optional) — Optionally, a (scalar) value that specifies whateach value should be filled in with when missing.values_fn
(Union, optional) — Optionally, a function applied to the value in each cellin the output. You will typically use this when the combination ofid_cols
and value column does not uniquely identify an observation. This can be a dict you want to apply different aggregations to different value columns. If not specified, will benumpy.mean
names_repair
— todo
The pivoted dataframe.
datar.apis.tidyr.
separate
(
data
, col
, into
, sep='[^0-9A-Za-z]+'
, remove=True
, convert=False
, extra='warn'
, fill='warn'
)
Given either a regular expression or a vector of character positions,turns a single character column into multiple columns.
data
— The dataframecol
(int | str) — Column name or position.into
— Names of new variables to create as character vector.UseNone
/NA
/NULL
to omit the variable in the output.sep
(int | str, optional) — Separator between columns.If str,sep
is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values. If int,sep
is interpreted as character positions to split at.remove
(bool, optional) — If TRUE, remove input column from output data frame.convert
(optional) — The universal type for the extracted columns or a dict forindividual ones Note that when givenTRUE
,DataFrame.convert_dtypes()
is called, but it will not convertstr
to other types (For example,'1'
to1
). You have to specify the dtype yourself.extra
(str, optional) — If sep is a character vector, this controls what happens whenthere are too many pieces. There are three valid options:- - "warn" (the default): emit a warning and drop extra values.
- - "drop": drop any extra values without a warning.
- - "merge": only splits at most length(into) times
fill
(str, optional) — If sep is a character vector, this controls what happens whenthere are not enough pieces. There are three valid options:- - "warn" (the default): emit a warning and fill from the right
- - "right": fill with missing values on the right
- - "left": fill with missing values on the left
Dataframe with separated columns.
datar.apis.tidyr.
separate_rows
(
data
, *columns
, sep='[^0-9A-Za-z]+'
, convert=False
)
Separates the values and places each one in its own row.
data
— The dataframe*columns
(str) — The columns to separate onsep
(str, optional) — Separator between columns.convert
(optional) — The universal type for the extracted columns or a dict forindividual ones
Dataframe with rows separated and repeated.
datar.apis.tidyr.
uncount
(
data
, weights
, _remove=True
, _id=None
)
Duplicating rows according to a weighting variable
data
— A data frameweights
— A vector of weights. Evaluated in the context of data_remove
(bool, optional) — If TRUE, and weights is the name of a column in data,then this column is removed._id
(str, optional) — Supply a string to create a new variable which gives aunique identifier for each created row (0-based).
dataframe with rows repeated.
datar.apis.tidyr.
unite
(
data
, col
, *columns
, sep='_'
, remove=True
, na_rm=True
)
Unite multiple columns into one by pasting strings together
data
— A data frame.col
(str) — The name of the new column, as a string or symbol.*columns
(str | int) — Columns to unitesep
(str, optional) — Separator to use between values.remove
(bool, optional) — If True, remove input columns from output data frame.na_rm
(bool, optional) — If True, missing values will be remove prior to unitingeach value.
The dataframe with selected columns united
datar.apis.tidyr.
replace_na
(
data
, data_or_replace=None
, replace=None
)
Replace NA with a value
This function can be also used not as a verb. As a function called as an argument in a verb, data is passed implicitly. Then one could pass data_or_replace as the data to replace.
data
— The data piped indata_or_replace
(optional) — When called as argument of a verb, this is thedata to replace. Otherwise this is the replacement.replace
(optional) — The value to replace withCan only be a scalar or dict for data frame. So replace NA with a list is not supported yet.
Corresponding data with NAs replaced